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This  report  describes  the  methodology  employed  by  the  Quantum  Resource  Estimator 
(QuRE)  toolbox  to  quantify  the  resources  needed  to  run  quantum  algorithms  on  quan¬ 
tum  computers  with  realistic  properties.  The  QuRE  toolbox  estimates  a  number  of 
quantities  including  the  number  of  physical  qubits  required  to  run  a  specified  quantum 
algorithm,  the  execution  time  on  each  of  the  specified  physical  technologies,  the  proba¬ 
bility  of  success  of  the  computation,  as  well  as  physical  gate  counts  with  a  breakdown  by 
gate  type.  Estimates  are  performed  for  error-correcting  codes  representing  codes  from 
both  the  concatenated  and  topological  code  families.  Our  work,  which  provides  these 
resource  estimates  for  a  cross  product  of  seven  quantum  algorithms,  six  physical  ma¬ 
chine  descriptions,  several  quantum  control  protocols,  and  four  error-correcting  codes, 
represents  the  most  comprehensive  resource  estimation  effort  in  the  field  of  quantum 
computation  to  date. 


1  Introduction 

Estimating  the  running  time,  number  of  qubits  and  other  resources  needed  by  realistic  models 
of  quantum  computers  is  the  first  necessary  step  to  reducing  these  resource  requirements.  This 
report  describes  our  Quantum  Resource  Estimator  (QuRE)  toolbox  which  we  used  to  calculate 
resource  estimates  for  a  cross  product  of  several  quantum  algorithms,  quantum  technologies, 
and  error-correction  techniques.  The  focus  of  this  work  is  on  the  estimation  methodology, 
overhead  caused  by  error  correction,  and  the  software  tools  that  we  developed.  Our  toolbox 
simulates  error  correction  with  the  Steane  code  [1,2],  Bacon-Slror  code,  Knill’s  post-selection 
scheme,  and  surface  code,  representing  codes  from  both  the  concatenated  and  topological 
error-correcting  code  families. 

The  QuRE  toolbox  is  implemented  as  a  suite  of  Octave  scripts.  The  inputs  for  the  QuRE 
toolbox  are  the  description  of  the  physical  properties  of  the  quantum  computer  (such  as 
gate  error  rate  and  gate  time),  and  logical  resource  requirements  of  the  quantum  algorithms 
(such  as  number  of  logical  qubits).  The  tool  automatically  generates  resource  estimates  for  a 
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cross  product  of  algorithms,  quantum  technologies,  and  error-correction  techniques.  For  each 
combination,  the  toolbox  reports  detailed  information  including  the  level  of  concatenation  or 
code  distance  needed  to  achieve  at  least  50%  circuit  reliability,  the  actual  circuit  reliability, 
running  time  of  the  algorithm,  number  of  physical  qubits,  and  a  gate  count  with  a  breakdown 
by  gate  type.  For  error-correcting  codes  that  use  ancilla  factories,  we  also  report  the  size 
of  the  ancilla  factory,  number  of  gates  used  by  the  ancilla  factory,  and  the  time  needed  to 
prepare  one  ancillary  state. 

In  this  work  we  consider  a  wide  range  of  physical  quantum  technologies  with  realistic 
properties,  each  with  several  choices  of  quantum  control  protocols.  The  properties  of  these 
technologies  have  been  studied  by  Hocker  et  al  [3].  As  shown  in  Section  3,  the  gate  error  rates 
for  the  worst  physical  gate  in  these  physical  technologies  range  from  approximately  10%  for 
Photonics  I  with  primitive  control  down  to  3.19  x  10-9  for  ion  traps  with  primitive  control. 
Since  the  error-correction  threshold  for  concatenated  codes  is  typically  in  the  1  x  10~3  to 
1  x  10-5  range,  many  of  the  models  do  not  meet  the  threshold,  making  these  concatenated 
error-correcting  protocols  unusable.  On  the  other  hand,  the  surface  code  which  has  threshold 
around  1%  meets  the  requirements  of  a  majority  of  the  models. 

The  QuRE  toolbox  is  preloaded  with  information  about  a  variety  of  quantum  algorithms, 
including  binary  welded  tree  algorithm  [4],  boolean  formula  algorithm  [5],  ground  state  es¬ 
timation  algorithm  [6],  quantum  linear  systems  algorithm  [7],  shortest  vector  algorithm  [8], 
quantum  class  number  algorithm  [9],  and  the  triangle  finding  problem  [10].  We  chose  these 
algorithms  because  their  logical  resource  requirements  are  known  and  have  been  analyzed 
in  the  scope  of  IARPA’s  Quantum  Computer  Science  program.  The  above  referenced  re¬ 
ports  show  the  total  circuit  gate  count,  detailed  breakdown  by  gate  type,  and  information 
about  parallelization  for  each  of  these  algorithms.  Moreover,  the  algorithms  cover  a  range  of 
quantum  computation  primitives  such  as  quantum  Fourier  transform,  quantum  simulation, 
amplitude  amplification,  phase  estimation,  quantum  walk  and  sieving.  It  should  be  noted 
that  the  logical  resource  requirements  of  the  selected  algorithms  vary  widely.  For  example, 
the  binary  welded  tree  algorithm  requires  1,220  logical  qubits  and  5.57  x  1010  logical  gates, 
whereas  the  shortest  vector  algorithm,  which  is  believed  to  be  a  hard  problem  for  quantum 
computation,  requires  4  x  1018  qubits  and  2.03  x  1022  gates. 

To  accurately  estimate  the  total  physical  resources,  the  QuRE  toolbox  heeds  the  locality 
constraints  of  quantum  technologies  -  two-qubit  CNOT  gates  can  only  be  performed  locally 
on  two  neighboring  physical  qubits.  To  that  end,  we  use  a  tiled  qubit  layout  for  concatenated 
codes.  Each  tile  contains  physical  qubits  that  represent  the  state  of  a  single  fault-tolerant 
logical  qubit.  To  perform  CNOT  gates  inside  each  tile,  either  SWAP  gates  or  ballistic  move¬ 
ment  must  be  used  to  move  the  two  interacting  qubits  together.  Since  movement  reduction  is 
clearly  desirable,  and  different  error-correcting  codes  have  different  structure,  we  use  a  custom, 
qubit  layout  for  each  of  the  three  concatenated  error-correcting  codes.  For  Steane  code,  we  use 
the  optimized  qubit  layout  introduced  by  Svore  et  al.  [11],  and  for  the  Bacon-Shor  code  the 
optimized  layout  of  Spedalieri  et  al.  [12].  Since  there  is  no  known  optimal  layout  that  reduces 
movement  for  the  Knill’s  post-selection  error-correcting  scheme,  we  designed  our  own.  QuRE 
also  uses  a  tiled  qubit  layout  for  the  surface  code,  where  a  pair  of  holes  representing  a  logical 
qubit  resides  inside  a  tile.  However,  the  computation  and  error  correction  with  the  surface 
code  is  inherently  local,  and  no  swapping  of  qubits  is  needed. 
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The  ion  trap  technology  supports  reliable  ballistic  movement  of  qubits.  Our  resource 
estimation  exploits  the  favorable  properties  of  ballistic  movement.  The  use  of  reliable  ballistic 
movement  has  three  benefits.  First,  avoiding  the  need  to  use  SWAP  gates  reduces  the  gate 
count.  Second,  the  reliability  of  the  move  operation  improves  the  error-correction  threshold 
for  concatenated  codes,  allowing  to  use  fewer  code  concatenations.  Finally,  the  move  operation 
is  much  faster  than  a  SWAP  gate,  reducing  the  execution  time. 

This  report  is  organized  as  follows.  Section  2  provides  a  high  level  overview  of  the  QuRE 
toolbox.  In  Section  3  we  describe  the  properties  of  the  physical  technologies  used  by  the  tool¬ 
box,  and  in  Section  4  we  summarize  the  logical  resource  requirements  of  the  studied  quantum 
algorithms.  In  Section  5  we  discuss  the  generic  aspects  of  resource  estimation  that  pertains 
to  concatenated  codes.  We  discuss  tiled  qubit  layout,  finding  the  optimal  concatenation  level, 
overhead  associated  with  correcting  memory  errors,  specifics  of  ballistic  movement.  In  Sec¬ 
tion  6  we  describe  the  specific  qubit  layout  required  by  the  Steane  code  and  analyze  the 
overhead  imposed  by  error  correction  using  the  Steane  code,  and  quantify  the  number  of  ele¬ 
mentary  operations  required  to  carry  out  certain  operations  at  a  specified  concatenation  level. 

In  Sections  7  and  8  we  repeat  the  analysis  for  the  Bacon  Shor  code  and  Kirill’s  post-selection 
scheme.  The  resource  requirements  of  quantum  computation  with  the  surface  code  are  dis¬ 
cussed  in  Sections  9  through  13.  In  Section  14  we  provide  some  details  about  the  Octave 
scripts  that  were  written  to  implement  the  resource  estimation.  Finally,  in  Section  15  we 
show  select  numerical  results  that  illustrate  the  resource  estimation  methodology  developed 
in  this  document. 

2  Functionality  of  the  QuRE  Toolbox 

Here  we  present  a  brief  overview  of  the  functionality  of  the  QuRE  toolbox.  QuRE  is  imple¬ 
mented  in  Octave  and  uses  modular  design  to  allow  easy  extendability. 

Figure  1  shows  a  schematic  view  of  the  major  components  of  QuRE.  At  the  heart  of 
the  tool  is  the  Main  Loop,  which  iterates  over  all  specified  quantum  algorithms,  quantum 
technologies,  and  quantum  error-correcting  codes.  For  each  combination,  the  Main  Loop 
calls  appropriate  modules  that  load  information  about  the  algorithm,  technology  and  error- 
correcting  code. 

An  Algorithm  Specification  module  provides  information  about  the  number  of  logical 
qubits  the  particular  algorithm  needs.  Logical  qubits  are  defined  as  the  fault-tolerant  qubits 
built  of  a  greater  number  of  unreliable  physical  qubits.  Number  of  logical  gates  and  simplified 
information  about  the  circuit  parallelism  are  also  specified  by  the  module.  Section  4  describes 
how  these  specifications  were  obtained  for  the  algorithms  preloaded  in  the  QuRE  toolbox. 

A  Technology  Specification  module  describes  properties  of  a  particular  physical  quan¬ 
tum  technology.  It  specifies  the  time  needed  to  carry  out  each  physical  gate,  the  error  of  the 
worst  gate,  and  information  about  memory  error  rate  per  unit  time.  Note  that  by  physical 
gate  we  shall  understand  a  non-fault-tolerant  quantum  gate  that  is  provided  by  the  technol¬ 
ogy  by  executing  one  or  more  instructions.  More  details  about  the  quantum  technologies 
preloaded  into  QuRE  that  are  used  in  this  paper  are  in  Section  3. 

An  Error  Correction  Specification  module  is  provided  for  each  supported  error-correcting 
code.  It  quantifies  the  time  and  the  number  of  physical  gates  needed  to  implement  a  logical 
gate  of  each  type  at  an  arbitrary  level  of  concatenation  (or,  in  case  of  topological  codes,  for 
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Fig.  1.  Functionality  of  the  QuRE  toolbox. 


an  arbitrary  code  distance).  The  module  also  quantifies  the  overhead  caused  by  magic  state 
distillation,  ancilla  preparation,  or  any  other  operations  pertinent  to  the  particular  error- 
correcting  code.  The  methodology  used  to  quantify  these  metrics  for  the  concatenated  and 
surface  codes  is  described  in  Section  6  through  13. 

The  Concatenated  Code  Resource  Estimator  reports  the  resources  needed  by  an 
algorithm,  technology,  and  concatenated  code  loaded  by  the  Main  Loop.  First,  the  module 
determines  the  minimum  concatenation  level  that  is  sufficient  to  complete  the  algorithm 
successfully  with  high  probability.  Then  it  evaluates  the  resources  needed  to  carry  out  fault- 
tolerant  operations  at  that  concatenation  level,  multiplies  them  by  the  number  of  operations 
used  by  the  algorithm,  evaluates  additional  resources  needed  for  magic  state  distillation,  and 
reports  the  results. 

The  Surface  Code  Resource  Estimator  reports  the  resources  needed  by  the  surface 
code  and  works  analogously  to  the  Concatenated  Code  Resource  Estimator. 

3  Physical  Quantum  Computation  Architectures 

The  properties  of  the  quantum  technologies  used  in  the  QuRE  toolbox  were  studied  by  Hocker 
et  al.  [3].  Their  work  describes  six  choices  of  a  technology,  and  for  some  technologies  they 
study  several  possible  choices  of  a  quantum  control  protocol.  They  quantify  the  durations 
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and  errors  of  all  basic  gates,  as  well  as  memory  errors  that  disrupt  the  state  of  idle  qubits  due 
to  interactions  with  the  environment.  The  quantum  technologies  were  modeled  by  Hocker  et 
al.  [3]  by  mapping  the  complicated  system  dynamics  of  each  architecture  onto  a  simplified, 
spin-based  Hamiltonian.  Noise  effects  were  simulated  with  a  Markovian  master  equation  to 
capture  dissipative  and  dephasing  effects,  while  stochastic  errors  were  used  to  capture  control 
errors  and  environmental  errors  typically  modeled  with  an  open  quantum  system  that  is 
dynamically  coupled  to  a  bath. 

Here  we  briefly  summarize  the  six  quantum  technologies,  focusing  on  the  quantities  needed 
for  our  resource  estimates  -  the  time  to  perform  one-  and  two-qubit  gates,  and  gate  and 
memory  errors.  Some  of  the  chosen  models  are  among  the  most  promising  candidates  suitable 
for  building  a  large-scale  quantum  computer.  At  the  same  time,  these  models  represent  a 
range  of  properties  that  the  future  quantum  computer  may  possess  -  for  example,  very  fast 
but  error  prone  superconductors,  slower  but  more  reliable  ion  traps,  and  neutral  atoms  with 
only  average  speed  and  average  error  properties  due  to  atomic  movement  of  qubits. 

Neutral  Atoms  [13]:  In  this  technology,  qubits  are  represented  by  ultracold  atoms 
trapped  in  an  optical  lattice.  Light  waves  are  used  to  trap  and  control  the  particles.  Compared 
to  technologies  with  ion  traps  that  trap  charged  ions  in  a  magnetic  or  optical  field,  the 
’’ultracold”  atom  properties  lead  to  their  stability  and  great  noise  resilience.  Ultracold  atoms 
are  thermally  isolated  and  experience  very  slow  T1  and  T2  times.  The  only  dominant  errors 
arise  from  difficulties  in  precisely  tuning  the  laser  to  a  specific  physical  qubit  position  in  the 
lattice  due  to  atom  motion  inside  the  lattice,  i.e. ,  it  is  difficult  to  hit  the  moving  qubit  with 
a  laser.  The  position  offset  causes  an  effective  control  error. 

Superconductors  [14,15]:  There  are  many  different  types  of  superconducting  qubits. 
The  type  considered  here  is  a  superconducting  phase  qubit.  The  primitive  building  block  for 
qubits  is  the  Josephson  Junction.  Single-  and  two-qubit  gates  are  based  upon  low-frequency 
flux  pulses  or  shaped  GHz  frequency  microwave  pulses,  and  state  readout  is  based  upon 
a  singleslrot  switched  measurement.  Superconducting  qubits  have  very  short  gate  times  of 
tens  of  nanoseconds.  State  preparation  is  based  upon  reset  by  dissipation,  and  is  therefore 
a  relatively  slower  operation.  Hocker  et  al.  [3]  attribute  the  relatively  higher  error  rates 
to  Markovian  noise  (a  combination  of  radiation  leakage  into  the  Josephson  junction,  circuit 
defects,  and  engineering  limitations  upon  insulating  the  system).  The  Markovian  noise  cannot 
be  reduced  using  pulse  shaping  techniques  that  maintain  constant  control  resources.  Reducing 
gate  times  and  the  expense  of  increasing  control  resources  can  reduce  such  Markovian  errors, 
but  Hocker  et  al.  considered  a  constrained  set  of  control  resources  common  to  such  physical 
system. 

Ion  Traps  [16]:  The  ion  trap  quantum  computer  is  based  on  a  2D  lattice  of  confined 
ions,  each  of  which  is  a  physical  qubit  which  can  be  moved  within  the  lattice  to  accommodate 
local  interactions  between  any  two  qubits.  The  ions  are  confined  using  electromagnetic  field. 
Lasers  are  applied  to  induce  coupling  between  qubit  states  to  implement  quantum  gates.  The 
noise  terms  in  these  laser  mediated  gates  are  associated  with  the  intensity  fluctuations  of  the 
laser,  resulting  in  very  low  gate  errors.  While  ion  traps  are  also  quite  stable  to  environmental 
noise,  being  a  charged  system  lends  them  susceptible  to  certain  noise  sources  beyond  those 
experienced  by  neutral  atoms. 

Photonics  I  [17, 18] :  This  type  of  photonics  system  is  essentially  a  form  of  ’’linear  optics” 
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Table  1.  The  gate  times  (in  ns)  for  all  basic  gate  constructs. 
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Table  2.  The  probability  of  error  of  the  worst  gate  and  the  probability  of  an  error  occurring  on  an 
idle  qubit  for  all  studied  quantum  architectures. 


Technology 

Control 

Probability  of 
Gate  Error 

Memory  Error 
(per  ns) 

Quantum  Dots 

Primitive 

9.89 

X 

10"1 

3.47  x 

10~2 

Neutral  Atoms 

Optimal 

8.17 

X 

10-3 

0.00 

Neutral  Atoms 

Primitive 

8.12 

X 

10"3 

0.00 

Neutral  Atoms 

Solovay  Kitaev 

1.77 

X 

10-3 

0.00 

Neutral  Atoms 

Trotter 

1.47 

X 

10-3 

0.00 

Photonics  I 

Primitive 

1.01 

X 

10_1 

9.80  x 

10~4 

Photonics  II 

Primitive 

5.20 

X 

10-3 

9.80  x 

10~5 

Superconductors 

Optimal 

6.56 

X 

10"4 

1.00  x 

10~5 

Superconductors 

Primitive 

1.00 

X 

10~5 

1.00  x 

10~5 

Ion  Traps 

Dyn.  Cor.  Gates 

7.99 

X 

10~3 

2.52  x 

10~12 

Ion  Traps 

Optimal 

2.93 

X 

10~7 

2.52  x 

10~12 

Ion  Traps 

Primitive 

3.19 

X 

10~9 

2.52  x 

io-12 

quantum  computing,  where  conventional  optical  equipment  is  used  with  a  single  element  that 
introduces  a  nonlinear  gate  in  order  to  generate  two-qubit  gates,  in  this  case  a  controlled 
phase  gate  [18].  The  elementary  single-qubit  gates  are  implemented  in  the  polarization  basis 
with  wave  plates.  Electro-optic  modulators  (EOM)  can  apply  a  desired  wave  plate  action 
commanded  by  classcial  signal,  i.e.,  voltages.  Measurement  is  based  on  photon  detection, 
which  can  be  obtained  using  a  single-photon  counter  such  as  avalanche  photodiode  (APD). 
A  thermally  excited  electron  can  create  a  dark  count,  leading  to  a  relatively  high  probability 
of  error  (approximately  10%).  Hocker  et  al.  [3]  use  several  ancillary  photons  and  repeated 
detection  to  achieve  a  sufficiently  low  measurement  error  that  matches  the  error  of  other 
gates. 

Photonics  II  [19]:  An  alternative  technology  to  use  photons  as  qubits  abandons  the 
linear  optics  paradigm  and  attempts  to  use  minimal  optical  equipment  to  achieve  gate  opera¬ 
tions.  This  technology  performs  two-bit  gates  deterministically  by  using  the  weak  cross-Kerr 
coupling  native  to  the  optical  equipment  and  homodyne  detection  [19].  This  is  in  contrast 
to  the  photonics  I  approach  that  instead  goes  to  great  lengths  to  harness  stronger  nonlinear 
effects. 

Quantum  Dots  [20]:  Two  electrons  confined  to  a  double  well  quantum  dot  of  GaAs 
comprise  single  qubits.  The  logical  basis  are  the  lowest  lying  hyperfine  energy  levels  of  the 
two-qubit  system,  which  can  be  tuned  to  a  spin  triple  or  spin  singlet  regime  by  means  of  an 
external  voltage  gate  [20].  Controls  are  implemented  through  nearby  voltage  gates  on  each 
well.  There  have  been  effective  implementations  of  these  systems  with  primitive  control  [20] 
and  dynamical  decoupling.  The  reported  errors  are  likely  artificially  too  high  because  of 
approximations  used  in  simulating  two-qubit  gates,  and  difficulty  with  accurately  simulating 
the  noise  models.  The  reported  errors  are  too  high  to  make  it  usable  with  any  error-correcting 
scheme  in  our  work.  The  studies  of  this  technology  are  still  ongoing,  and  results  will  likely 
improve  in  further  works. 

In  the  models  discussed  above,  the  basic  quantum  gate  set  includes  the  following  opera- 
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tions:  controlled  not  (CNOT),  swapping  of  two  qubits  (SWAP),  the  Hadamard  gate  (H),  state 
preparation  (|+)  prep.,  |0)  prep.),  qubit  measurement  (X  meas.,  Z  meas.)  the  Paulis  (X,  Y,  Z) 
and  the  S  and  T  gates  (S,  T).  We  use  this  gate  set  throughout  this  document.  Note  that  the 
gate  set  is  universal  for  quantum  computation,  and  is  overinclusive.  All  the  quantum  tech¬ 
nologies  either  support  these  operations  natively,  or  the  gates  can  be  constructed  from  more 
elementary  operations.  For  example,  SWAP  can  be  constructed  using  three  CNOT  gates. 
Note  that  there  is  often  more  than  one  way  to  compose  a  gate.  For  example,  an  alternative 
way  of  decomposing  the  SWAP  gate  is  to  use  an  iSWAP,  CPHASE,  and  two  (parallel)  S 
gates.  This  particular  decomposition  can  yield  substantially  better  error  rate  and  gate  time  - 
for  example,  this  takes  a  total  of  17  ns  on  the  superconducting  architecture,  compared  to  the 
66  ns  needed  to  perform  three  CNOTs ,  reducing  both  the  gate  time  and  error.  The  report 
of  Hocker  et  al.  describes  the  optimal  choices  for  these  decompositions. 

The  durations  of  basic  one-  and  two-qubit  gates  are  shown  in  Table  1.  Note  that  all  units 
of  time  in  this  paper  are  in  nanoseconds,  unless  stated  otherwise.  Another  important  property 
of  the  technologies  is  reliability.  Table  2  summarizes  the  error  probability  after  applying  the 
worst  gate,  as  well  as  the  probability  of  a  bit  flip  per  nanosecond  on  an  idle  qubit. 

The  models  of  Hocker  et  al.  [3]  consider  many  details  that  a  realisitic  computer  must 
posses,  including  a  basic  instruction  set,  errors  due  to  qubit  movement  and  decoherence,  and 
the  use  of  currently  known  control  protocols  to  optimize  properties  of  quantum  gates.  While 
the  experimental  demonstrations  of  these  technologies  to  date  have  been  limited  to  small  scale, 
the  models  used  here  are  among  the  most  plausible  and  realistic  for  a  large  scale  quantum 
computer. 

4  Quantum  Algorithms 

In  order  to  study  the  resource  requirements  of  quantum  computation,  we  need  to  use  repre¬ 
sentative  quantum  algorithms  with  the  right  ’’mix”  of  quantum  gates  and  known  parallelism 
properties.  For  this  purpose,  the  following  algorithms  were  studied  in  the  IARPA  Quantum 
Computer  Science  program,  and  we  report  their  properties  here: 

•  binary  welded  tree  algorithm  [4]  which  finds  the  opposite  root  of  two  connected  binary 
trees, 

•  boolean  formula  algorithm  [5]  which  evaluates  a  boolean  formula, 

•  ground  state  estimation  algorithm  [6]  which  finds  the  ground  state  energy  of  a  molecule, 

•  quantum  linear  systems  algorithm  [7]  which  finds  x  in  the  linear  system  Ax  =  6, 

•  shortest  vector  algorithm  [8]  which  finds  unique  shortest  vector  in  an  integer  lattice, 

•  quantum  class  number  algorithm  [9]  which  finds  the  order  of  the  class  group  of  a  real 
quadratic  field,  and 

•  the  triangle  finding  problem  [10]  which  finds  the  nodes  forming  a  triangle  in  a  dense 
graph. 

These  algorithms  represent  several  key  algorithmic  techniques,  including  e.g.  quantum  Fourier 
transform,  quantum  simulation,  amplitude  amplification,  quantum  random  walk,  quantum 
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simulation,  and  phase  estimation.  These  techniques  form  the  building  blocks  of  many  other 
quantum  algorithms. 

The  resource  requirements  reported  in  this  section  use  problem  sizes  specified  in  the 
IARPA  Computer  Science  program,  and  are  studied  in  [4-10].  The  problem  sizes  in  the 
IARPA  program  were  chosen  so  that  they  are  untractable  for  classical  computers.  There¬ 
fore,  it  should  not  be  surprising  that  some  of  the  studied  algorithms  may  be  intractable  for 
quantum  computers  as  well.  We  describe  the  algorithms  and  the  problem  sizes  next. 

4-1  Description  of  the  Quantum  Algorithms 

Binary  Welded  Tree  [21]  :  The  problem  input  is  a  pair  of  binary  trees  that  are  connected  at 
their  leaves.  The  goal  is  to  start  at  a  root  of  one  of  the  trees  marked  as  ’entrance’  and  find  the 
opposite  root  marked  as  ’end’  by  traversing  the  graph.  The  algorithm  uses  an  oracle  to  discover 
the  structure  of  the  graph.  The  oracle  returns  the  names  of  the  three  neighbors  of  a  specified 
vertex.  Due  to  a  careful  construction  of  the  problem  (each  node  has  an  exponential  number 
of  labels,  trees  are  connected  in  a  specific  way,  etc.)  a  sub-exponential  classical  algorithm 
that  finds  the  ’end’  w.h.p.  is  not  known.  A  quantum  algorithm  that  uses  continuous-time 
quantum  random  walk  finishes  in  polynomial  time.  In  order  to  evaluate  a  problem  instance 
that  is  intractable  for  classical  computers,  the  tree  depth  was  chosen  as  n  =  300. 

Boolean  Formula  [22]:  This  problem  uses  the  algorithm  of  Childs  et  al.  [22]  for  solving 
the  boolean  formula  problem  which  determines  if  an  {AND,  OR}  tree  evaluates  to  true.  The 
boolean  formula  algorithm  is  used  to  find  the  best  overall  strategy  in  a  two-player  game  of 
hex  with  a  9  by  7  board.  The  sequence  of  moves  in  a  two-player  game  can  be  represented  by 
an  {AND,  OR}  tree  where  the  leafs  correspond  to  the  outcomes  of  the  game.  The  algorithm 
is  based  on  discrete-time  quantum  walk.  An  oracle  indicates  which  of  the  two  players  wins. 

Ground  State  Estimation  [23]:  This  algorithm  finds  the  ground  state  energy,  E0,  of 
a  specified  molecule.  The  energy  is  estimated  to  b  bits  of  precision.  The  quantum  circuit 
for  the  algorithm  is  studied  in  [6]  which  quantifies  the  logical  gate  counts  in  the  quantum 
circuit  as  a  function  of  the  precision  b  and  the  number  of  wave  functions  M  that  describe 
the  ground  state.  The  polynomial  time  quantum  algorithm  is  based  on  the  approach  outlined 
in  [23]  which  uses  quantum  simulation  and  phase  estimation.  In  this  work,  we  chose  a  problem 
instance  that  finds  E$  for  the  molecule  with  b  =  9  bits  of  precision.  Note  that  the 

Fe2<S2  molecule  needs  M  =  208  wave  functions  to  describe  the  ground  state. 

Quantum  Linear  Systems  [24]:  The  algorithm  finds  the  solution  to  a  linear  system 
Ax  =  b  by  mapping  it  into  a  quantum  system  A\x)  =  \b)  with  state  vectors  |x)  and  | b) 
and  Hamiltonian  A.  Quantum  phase  estimation  and  Fourier  transform  are  used  to  solve  for 
| a:)  by  extracting  the  eigenvalues  of  A.  The  studied  problem  size  is  dim(A)  =  3  *  108.  The 
report  [7]  analyses  a  deterministic  variant  of  the  quantum  linear  systems  algorithm  where  a 
non-deterministic  measurement  is  replaced  by  estimating  probabilities  using  amplitude  esti¬ 
mation. 

Shortest  Vector  Algorithm  [25]:  Given  a  n  x  n  integer  lattice  B,  the  algorithm  finds 
an  integer  vector  v  such  that  the  vector  Bv  has  minimal  length  under  the  Euclidean  norm. 
The  problem  formulation  also  guarantees  that  the  shortest  vector  is  unique  -  the  next  shortest 
vector  is  longer  by  a  factor  of  at  least  n3 .  The  conceptual  primitives  used  by  the  quantum 
algorithm  are  quantum  Fourier  transform  and  sieving.  This  algorithm  is  not  efficient  because 
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it  runs  in  a  time  longer  than  poly(n).  Nevertheless,  it  is  interesting  because  it  uses  a  variety 
of  primitives,  some  of  which  are  not  explored  by  the  other  quantum  algorithms.  The  instance 
size  was  chosen  as  n  =  50. 

Quantum  Class  Number  [26]:  This  algorithm  finds  the  order  of  the  class  group  of  a 
real  quadratic  number  field.  The  analysis  is  based  on  the  work  of  Hallgren  [26]  that  shows 
how  the  class  number  of  a  real  quadratic  number  field  may  be  computed  by  extending  a 
standard  algorithm  for  the  hidden  subgroup  problem.  The  algorithm  also  uses  quantum 
Fourier  transform.  The  size  of  the  problem  was  chosen  to  be  n  =  124  decimal  digits  in  the 
quadratic  discriminant. 

The  Triangle  Finding  Problem  [27]:  Given  a  dense  graph,  this  algorithm  finds  the 
nodes  in  the  graph  forming  a  triangle  if  one  exists.  Similarly  as  in  the  case  of  the  binary 
welded  tree  algorithm,  an  oracle  and  a  specific  graph  structure  is  used  to  ensure  the  problem 
does  not  have  an  efficient  classical  solution.  The  graph  is  dense,  containing  fully  connected 
sub-components,  and  only  one  triangle,  if  any.  The  conceptual  primitives  used  by  the  efficient 
quantum  algorithm  are  quantum  random  walk  and  amplitude  amplification.  In  this  work  we 
chose  a  graph  with  32,  768  nodes. 

4-2  Methodology  for  Logical  Resource  Estimation 

The  algorithm  analyses  in  [4-10]  report  logical  gate  counts  that  contain  discrete  quantum  gates 
as  well  as  arbitrary  rotations.  Each  of  these  arbitrary  rotations  needs  to  be  decomposed  into  a 
discrete  set  of  gates  that  can  be  implemented  fault  tolerantly.  To  perform  this  decomposition, 
we  use  the  result  of  Bocharov  et  al.  [28]  that  shows  how  to  obtain  minimal  depth  decomposition 
in  the  {H,  T}  basis,  ensuring  that  the  occurrence  of  expensive  T  gates  is  minimized.  Their 
approach  is  based  on  the  Solovay-Kitaev  theorem  [29] . 

The  result  of  Bocharov  et  al.  [28]  states  that  for  any  e,  an  arbitrary  single-qubit  gate  U 
can  be  decomposed  with  precision  e  using  Q(logc(l/e))  gates  from  the  universal  discrete  gate 
set.  Figure  3  in  [28]  shows  that  a  decomposition  that  uses  1,000  T  gates  results  in  gate  error 
of  1  x  10~ ' .  Furthermore,  increase  of  the  number  of  T  gates  in  the  decomposition  by  one  order 
of  magnitude  improves  the  error  by  three  orders  of  magnitude.  Our  QuRE  toolbox  uses  the 
results  of  this  empirical  study.  We  first  determine  the  desired  precision  of  the  decomposition 
so  that  50%  probability  of  success  of  the  algorithm  can  be  guaranteed.  Then  we  quantify  the 
number  of  H  and  T  gates  per  rotation. 

Let  arbitraryRot  denote  the  number  of  arbitrary  rotations  in  the  algorithm,  and  let 
errorPerRot  be  the  desired  maximal  error  for  each  arbitrary  rotation.  To  guarantee  that 
the  accumulated  error  across  all  rotations  is  at  most  0.5,  it  is  sufficient  to  require: 


errorPerRot  <  0.5 /arbitrary Rot 


(1) 


Then  from  Figure  3  in  [28]  the  number  of  H  and  T  gates  per  rotation  is  approximately: 


2  —  logiQ^e.rrorPerRot) 

gatesPerRot  =  10  5 


(2) 


f.3  The  Logical  Resource  Requirements 

The  summary  of  the  properties  of  the  quantum  algorithms  appears  in  Tables  3,  4,  and  5.  Note 
that  the  data  in  these  tables  summarizes  the  logical  resource  requirements  of  the  algorithms 
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Table  5.  The  number  of  logical  qubits  required  by  each  algorithm. 

Algorithm  Logical  Qubits 


Binary  Welded  Tree 

1.22 

X 

10: 

Boolean  Formula 

2.63 

X 

10: 

Class  Number 

1.88 

X 

10 

Ground  State  Est. 

2.20 

X 

10; 

Quant.  Linear  Syst. 

2.55 

X 

10; 

Shortest  Vector 

4.00 

X 

10 

Triangle  Finding 

9.04 

X 

10' 

before  error-correction  overhead  is  taken  into  account.  In  particular,  Table  5  shows  the  number 
of  logical  qubits  needed  by  each  algorithm.  This  count  includes  all  ancillas.  Table  3  shows  the 
number  of  logical  quantum  gates  of  each  type  needed  to  implement  each  algorithm.  Finally, 
Table  4  shows  the  parallelization  factor  for  each  gate  type.  The  parallelization  factor  indicates 
how  many  of  the  gates  of  a  particular  type  can  be  performed  in  parallel.  For  example,  in 
case  of  the  ground  state  estimate  algorithm,  we  see  that  1.51  x  1010  logical  Hadamard  gates 
are  needed,  and  the  parallelization  factor  is  6,  meaning  that  on  average  6  H  gates  can  be 
scheduled  simultaneously  in  the  logical  quantum  circuit. 

5  Error  Correction  with  Concatenated  Codes 

This  section  describes  our  high  level  approach  to  error  correction  with  the  concatenated 
codes.  First  in  Subsection  5.1  we  introduce  a  qubit  layout  that  uses  tiles  as  building  blocks, 
and  describe  the  functionality  of  a  single  tile.  Then  in  Subsection  5.2  we  explain  why  it  is 
necessary  to  customize  the  qubit  layout  in  each  tile  to  meet  the  specific  requirements  of  each 
error-correcting  code  /  quantum  technology  combination.  In  Subsection  5.3  we  explain  how 
to  determine  the  optimal  number  of  concatenation  levels  of  error  correction  as  well  as  how 
to  estimate  the  success  probability  of  the  algorithm.  Finally,  in  Subsection  5.4  we  quantify 
the  resources  needed  to  correct  memory  errors  and  in  Subsection  5.5  we  justify  our  choice  of 
syndrome  extraction  method. 

5.1  The  Tiled  Qubit  Layout 

Our  qubit  layout  for  concatenated  codes  is  modeled  after  the  microarchitecture  of  Svore  et 
al.  [11],  and  Spedalieri  et  al.  [12].  In  particular,  we  use  a  block  structure,  where  each  building 
block  (tile)  stores  a  logical  qubit.  Each  tile  contains  enough  space  to  store  one  data  qubit, 
one  ancilla,  sufficient  number  of  verification  qubits  (to  allow  ancilla  verification),  and  space 
for  all  data  qubits  from  one  neighboring  tile.  Error  correction  can  be  performed  in  each  tile 
by  applying  the  correct  gate  sequence.  Communication  between  tiles  is  achieved  by  swapping. 
For  example,  to  perform  a  CNOT  operation  on  two  neighboring  tiles,  the  data  qubits  from 
one  tile  are  moved  into  the  other  tile,  the  CNOT  operation  is  performed,  and  then  the  qubits 
return  to  their  original  location. 

A  high  level  picture  of  the  architectural  organization  is  in  Figure  2.  We  assume  that  the 
tiles  are  arranged  in  a  2-D  structure.  An  example  of  the  structure  of  a  tile  is  shown  in  Figure  3. 
The  figure  shows  the  layout  of  Svore  et  al.  [11]  which  has  tile  size  6x8.  At  higher  levels  of 
concatenation,  the  structure  is  expanded  in  a  hierarchical  fashion  using  6x8  building  blocks 
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Classical  Control 


Tile  1 


Sea  of 
lower 
level 
qubits 


Tile  4 


Sea  of 
lower 
level 
qubits 


Tile  2 


Sea  of 
lower 
level 
qubits 


Tile5 


Sea  of 
lower 
level 
qubits 


Tile  3 


Sea  of 
lower 
level 
qubits 


Tile  6 


Sea  of 
lower 
level 
qubits 


Fig.  2.  The  qubit  layout  consists  of  tiles.  Each  tile  represents  one  logical  qubit. 


Fig.  3.  One  tile  at  the  second  concatenation  level  (bounded  by  the  large  green  box).  A  tile  at  the 
first  level  of  concatenation  consists  of  the  qubits  in  the  smaller  red  box. 


from  the  next  lower  level.  Note  that  the  tile  size  will  differ  for  different  concatenated  codes 
as  enough  qubits  must  be  present  to  store  one  logical  data  qubit  as  well  as  ancillas  needed  to 
perform  error  correction  on  the  data. 

5.2  Customizing  Qubit  Layout 

The  layout  needs  to  be  customized  for  each  concatenated  code.  The  Steane  code  uses  a 
tile  of  size  6x8  whereas  the  Bacon  Shor  code  uses  a  tile  of  size  7x7.  The  qubit  layout 
and  sequence  of  operations  needed  to  perform  error  correction  was  optimized  to  maximize 
the  error-correction  threshold  and  minimize  movement  and  number  of  operations  in  [11, 12]. 
Therefore,  we  use  the  same  layout.  Since  the  optimal  layout  for  the  Knill’s  post-selection 
scheme  hasn’t  been  studied,  we  had  to  design  our  own  tile  and  sequence  of  operations. 
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5.3  Finding  the  Optimal  Concatenation  Level 

Here  we  answer  the  question  how  many  levels  of  concatenation  we  need  to  achieve  a  desired 
fidelity.  Let’s  assume  that  p  is  the  gate  error,  the  failure  probability  of  components  at  the 
lowest  level  of  the  code.  Fault-tolerant  constructions  of  the  gates  guarantee  that  the  proba¬ 
bility  that  a  circuit  introduces  two  errors  is  0(p2)  =  cp2  if  no  concatenation  is  used.  Here 
c  =  1/pth  is  a  constant  representing  the  threshold  of  the  concatenated  code.  It  follows  that 
with  two  levels  of  concatenation,  the  probability  of  failure  becomes  c(cp2)2,  and  with  k  lev¬ 
els  of  concatenation  (cp)2  /c.  Suppose  that  we  wish  to  simulate  a  circuit  with  N  gates  and 
achieve  final  accuracy  of  e.  Thus  each  gate  must  be  accurate  to  e/N  so  we  need  to  find  k 

satisfying:  <  ■§. 

In  this  work  we  assume  that  the  circuit  needs  to  finish  with  success  probability  of  at  least 
50%.  Therefore,  the  desired  concatenation  level  is  k  =  \ log (l°9 h ^ °9 ( 2 ) ] ,  and  the 

probability  with  which  the  circuit  outputs  the  correct  result  is  pSUccess  =  (1  ~  Pthif^)2  )N ■ 
Note  that  the  success  probability  pSUccess  will  be  generally  in  the  50%  to  100%  range  due  to 
the  use  of  the  ceiling  function  when  calculating  the  desired  concatenation  level  k. 

5-4  Correcting  Memory  Errors 

Error  correction  needs  to  be  performed  periodically  even  for  idle  qubits  on  which  no  quantum 
gates  act.  As  was  shown  in  Section  3,  the  memory  error  rate  for  a  single  qubit  ranges  up  to 
Pmem  =  3.47  x  10~2  per  ns.  To  estimate  the  resources  needed  to  correct  memory  errors,  we 
assume  that  these  errors  are  corrected  periodically  for  all  qubits,  and  calculate  the  optimal 
amount  of  time  T  between  subsequent  error-correction  steps.  This  approach  should  not  lead 
to  significant  overestimation  of  resources  because  our  algorithm  analysis  shows  that  most 
of  the  qubits  are  idle  most  of  the  time,  and  successive  gates  that  are  always  followed  by 
error-correction  operations  frequently  act  on  the  same  set  of  qubits. 

Error  correction  needs  to  be  performed  early  enough  to  ensure  that  the  probability  of  an 
error  on  any  single  qubit  is  below  the  gate  errror  probability  p.  Thus  we  require  T  =  p/pmem- 

5.5  Choice  of  Syndrome  Extraction 

The  three  possible  error  syndrome  extraction  methods  are  Knill,  Shor,  and  Steane.  We  use 
the  Steane  extraction  method  for  the  Steane  and  Bacon-Shor  codes  and  Knill’s  method  for 
the  Knill’s  post-selection  scheme  for  the  following  reasons: 

1.  Steane  code:  for  our  qubit  layout,  it  is  important  to  minimize  the  use  of  space  and 
thus  movement,  potentially  at  the  cost  of  increased  waiting  and  memory  errors.  The 
reason  is  that  memory  errors  will  typically  occur  at  lower  rates  than  SWAP  gate  er¬ 
rors.  This  leaves  us  with  the  choice  of  the  Shor’s  and  Steane’s  method,  because  Knill’s 
method  requires  two  ancillas,  and  hence  our  tiles  would  have  to  be  much  bigger,  require 
more  operations  per  gate,  more  physical  qubits,  and  all  operations  would  take  longer. 
Of  the  remaining  two,  Shor’s  method  is  slightly  more  space  efficient  if  implemented  fully 
sequentially.  However,  in  Shor-type  syndrome  extraction,  the  number  of  gates  between 
data  and  ancilla  qubits  is  greater  than  in  Steane-type  syndrome  extraction,  which  will 
negatively  affect  the  threshold,  requiring  more  concatenations,  and  more  frequent  cor¬ 
rection  of  memory  errors  on  idle  qubits.  Therefore,  we  believe  that  Steane’s  method  is 
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superior  to  the  other  two.  Steane  syndrome  extraction  was  chosen  for  similar  reasons 
in  [11]. 

2.  Bacon-Shor  code:  We  chose  Steane  syndrome  extraction  for  the  same  reasons  as 
above. 

3.  Knill’s  post-selection  scheme:  The  Knill’s  post-selection  scheme  requires  the  use  of 
Knill’s  method. 

6  Error  Correction  with  the  Steane  Code 

Now  we  describe  the  overhead  imposed  by  error  correction  using  the  Steane  [[7, 3, 1]]  code  [1,2]. 
First  in  Subsection  6.1  we  provide  some  notation  and  explain  how  gates  can  be  implemented 
fault  tolerantly.  Then  in  Subsection  6.2  we  describe  the  tile  size  and  qubit  locations  within  a 
tile  optimized  for  the  Steane  code.  Then  in  Subsection  6.3  we  provide  resource  estimates  for 
each  logical  gate  assuming  that  the  Steane  code  uses  m  levels  of  concatenation.  Note  that 
numerical  results  appear  in  Section  15  at  the  end  of  the  document.  There  we  illustrate  the 
resource  estimation  methodology  developed  in  this  section  by  showing  the  resources  used  by 
elementary  logical  gates  at  different  concatenation  levels,  as  well  as  the  total  resources  needed 
by  each  algorithm  on  all  the  studied  physical  technologies. 

6.1  The  Steane  Code 

Logical  qubits  in  the  Steane  code  are  encoded  using  seven  qubits.  The  Steane  Code  is  a  sta¬ 
bilizer  code  [30]  with  stabilizers  g\  =  IIIXXXX,  g2  =  IXXIIXX ,  g3  =  XIXIXIX ,  g 4  = 
1I1ZZZZ,  g3  =  IZZIIZZ ,  g6  =  ZIZIZIZ  that  map  the  encoded  logical  states  to  them¬ 
selves.  Therefore,  the  logical  states  |0)  and  |I)  can  be  written  as  follows:  |0)  =  ^  [|0000000)  + 
11010101)  +  10110011)  +  11100110)  +  10001111)  +  11011010)  +  10111100)  +  11101001)],  and 
|1)  =  ^[|1111111)  + 10101010)  +  11001100)  + 10011001)  +  11110000)  +  10100101)  +  11000011)  + 

1 00101 10)].  Note  that  the  states  |0)  and  |1)  were  chosen  to  have  even  and  odd  number  of 
ones,  respectively. 

The  Steane  code  can  be  concatenated,  each  Level  m  encoded  qubit  block  is  built  using 
seven  Level  m  —  1  logical  qubits  and  gates.  Level  1  blocks  are  at  the  lowest  level  of  encoding 
and  they  are  built  of  Level  0  qubits  and  gates  provided  at  the  physical  level. 

We  will  use  the  following  notation.  Standard  gates  at  the  m-th  level  of  concatenation 
are  denoted  by  X^,  Y[m),  Z(m),  CNOTrm ),  iL(m),  Stm ),  T)m).  Measurement  in  the  X  and 
Z  basis  will  be  denoted  by  Mx(m)  and  Fault-tolerant  preparation  of  states  |0) 

and  |+)  at  the  m-th  level  of  concatenation  is  denoted  by  Pj o)(m)  and  P\+)(m)-  The  error- 
correction  operation  is  represented  by  £C(m\  and  error  detection  is  represented  by  El ?(m).  In 
our  analysis,  the  sequence  of  operations  required  to  implement  a  specified  gate  or  operation  is 
denoted  ops (...),  and  the  time  required  to  perform  the  specified  operation  when  parallelization 
is  taken  into  account  is  denoted  time(...)  where  ...  is  replaced  by  the  operations  we  wish  to 
analyze. 

It  is  easy  to  verify  that  the  two  circuits  in  Figure  4  produce  the  logical  states  |0)  and  |+). 
However,  these  circuits  are  not  fault  tolerant  as  a  single  fault  will  lead  to  preparation  of  an 
incorrect  state.  The  fault-tolerant  implementation  of  these  circuits  is  shown  in  Figure  5.  To 
prepare  state  |0),  both  the  data  code  block  |  Q)  and  an  ancilla  |V)  are  non- fault  tolerantly 
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Fig.  4.  Non-fault-tolerant  circuit  for  preparation  of  state  |0)  on  the  left  and  |+)  on  the  right. 


Fig.  5.  Fault-tolerant  preparation  of  state  |0)  and  |+)  must  be  repeated  until  the  measurement 
outcome  c  is  1. 


initialized  in  state  |0).  A  fault  in  the  preparation  is  detected  by  performing  a  CNOT  on  the 
two  code  blocks  and  measuring  the  ancilla  in  the  Z  basis.  If  the  measurement  outcome  is  —1 
an  error  occurred  and  the  process  must  be  repeated.  The  state  |+)  can  be  prepared  similarly 
as  shown  on  the  right  hand  side  of  the  figure. 

The  Steane  code  belongs  to  the  family  of  Calderbank-Shor-Steane  (CSS)  codes,  a  family  of 
codes  with  transversal  implementation  of  most  gates,  including  the  CNOT  gate.  Transversal 
CNOT  gate  at  Level  m  can  be  obtained  by  applying  seven  CNOT  gates  to  the  corresponding 
control  and  target  qubits  at  Level  m  —  1.  Figure  6  shows  a  fault-tolerant  implementation  of 
the  CNOT  gate.  Gates  that  can  be  performed  transversally  also  include  the  single  qubit  Pauli 
gates  and  S  and  H  gates.  Figure  6  also  shows  how  to  do  measurement.  Since  the  logical  state 
|0)  and  1 1)  have  even  and  odd  number  of  ones,  respectively,  the  classical  parity  calculation  on 
the  Z  basis  measurements  of  the  seven  code  blocks  distinguishes  |0)  and  |1).  To  ensure  fault 
tolerance,  each  logical  gate  is  followed  by  error  correction. 

In  order  to  have  a  universal  gate  set,  we  also  need  the  n/8  gate  called  the  T  gate.  Un¬ 
fortunately,  a  fault-tolerant  version  of  this  gate  cannot  be  constructed  transversally.  A  fault- 
tolerant  T  gate  is  shown  in  Figure  7.  This  gate  sequence  was  originally  constructed  in  [31] 
using  one-bit  teleportation.  The  gate  sequence  teleports  the  state  \i[>)  from  the  data  block  to 
the  ancilla  and  applies  the  T  gate  to  the  state.  Note  that  the  ancilla  must  be  fault-tolerantly 
initialized  in  the  state  \4>+)  =  TH  |0)  =  This  initialization  is  shown  in  the  dashed 

block,  and  requires  two  fault-tolerant  preparations  of  the  cat  state  ^=(|0)  +  1 1) ) ,  which  is 
depicted  in  Figure  8. 

To  correct  errors,  we  use  the  Steane  error  extraction  method  [2,32]  that  can  be  better  par¬ 
allelized  and  uses  fewer  gates  than  the  cat  state  method  [32] .  A  fault-tolerant  implementation 
of  the  Steane  method  is  shown  in  Figure  9.  Two  ancillas  are  prepared  fault-tolerantly,  one  in 
state  |+)  and  the  other  in  state  |0).  X  errors  are  corrected  first.  The  first  CNOT  does  not 
affect  the  encoded  state  of  the  ancilla  |+)  or  the  encoded  code  block,  but  it  propagates  each 
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Fig.  6.  Fault-tolerant  implementation  of  the  CNOT  gate  (on  the  left),  the  single  qubit  gates  X, 
Y ,  Z  and  S  (top  right  corner),  and  Z  basis  measurement  (bottom  right). 
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Fig.  7.  Fault-tolerant  T  gate  construction. 


X  error  in  the  data  to  the  corresponding  position  on  the  |+)  ancilla.  The  Z- type  syndrome 
which  detects  X  errors  is  extracted  by  measuring  the  ancilla  qubits  in  the  Z  basis  and  apply¬ 
ing  a  classical  parity  check  to  the  measurement  outcomes.  E.g.,  for  a  stabilizer  ®”=1  Zbi  and 
measurement  outcome  {z\,  ...,  zn)  the  eigenvalue  of  the  stabilizer  is  b  ■  z.  The  X-type  error 

correction  is  then  performed  as  indicated  by  the  syndromes,  this  is  represented  in  the  circuit 
in  Figure  9  by  the  symbol  7 Zx-  The  X-type  stabilizers  are  obtained  similarly  by  coupling 
the  data  to  the  ancilla  initialized  in  state  |0),  applying  a  CNOT  and  X  basis  measurement 
followed  by  the  parity  check  and  Z- type  error  correction,  if  any. 

Note  that  the  error  detection  operation  denoted  8V  differs  from  the  error  correction  EC 
depicted  in  Figure  9  by  removing  the  two  error-correction  operations  IZx  and  Hz- 
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Fig.  8.  Fault-tolerant  circuit  for  cat  state  preparation. 
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Fig.  9.  Illustration  of  the  Steane-EC  syndrome  extraction  method. 


6.2  Tiled  Layout  for  the  Steane  Code 

In  order  to  appropriately  quantify  the  resources  required,  we  first  describe  the  physical  layout 
being  used.  We  use  the  tile  structure  used  in  [11],  designed  to  minimize  the  amount  of  SWAP 
operations  used  during  error-correction  routines,  and  thus  preserve  a  high  error  threshold.  The 
tile  consists  of  a  6  x  8  lattice  of  qubits.  The  following  figure  shows  a  snapshot  of  the  operations 
and  state  of  the  tile  during  the  ancilla  preparation  part  of  error  correction: 
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Ancilla  qubits  are  labeled  with  a  and  data  qubits  are  labeled  with  d.  Ancilla  qubits  labeled 
with  v  are  used  for  verification.  O  locations  represent  dummy  qubits  that  are  used  as  channels 
when  qubits  are  swapped.  For  details  on  the  exact  gate  sequences  being  used  we  refer  the 
reader  to  the  original  paper  [11] .  The  threshold  obtained  for  this  layout,  and  the  corresponding 
circuits  implementing  the  error-correction  subroutines,  is  pth  =  3.6  x  10-5  which  compares 
favorably  to  the  idealized  threshold  pu,  =  1-85  x  10-5  that  ignores  the  cost  of  movement  [11]. 

It  is  easy  to  calculate  the  number  of  physical  qubits  in  the  system.  In  a  system  with  n 
tiles  and  Steane  code  with  l  levels  of  concatenation,  the  number  of  physical  qubits  we  need 
is  n(8  x  6)1.  Note  that  the  number  of  physical  qubits  is  different  when  the  Bacon  code  and 
the  Knill  scheme  are  used.  First,  these  codes  use  tiles  of  different  size.  Second,  these  codes 
require  the  use  of  ancillas,  and  additional  space  is  needed  for  ancilla  factories.  We  provide 
estimates  of  the  size  of  ancilla  factories  in  our  Bacon-Shor  code  and  Knill  scheme  analysis  in 
the  subsequent  sections. 


M.  Suchara  et  al.  19 


6.3  Quantifying  Resources  with  m  Levels  of  Concatenation 

Here  we  estimate  the  resources  needed  to  perform  a  single  logical  gate  at  Level  m  of  con¬ 
catenation  with  the  Steane  code.  Specifically,  we  calculate  the  number  of  Level  0  gates,  and 
the  total  gate  time  taking  parallelization  into  account.  We  distinguish  horizontal  and  vertical 
CNOT  gates,  horizontal  and  vertical  CNOTs  act  on  qubits  (or  tiles)  that  are  adjacent  along 
the  horizontal  and  vertical  axes,  respectively.  Similarly  we  distinguish  horizontal  and  vertical 
SWAP  gates.  We  express  the  resources  for  a  single  logical  gate  at  Level  m: 

Error  Detection  and  Correction 

A  fault-tolerant  implementation  of  qantum  error-correcting  protocol  for  a  single  logical  code 
block  (assuming  ancilla  preparation  always  succeeds): 

ops(£C{m))  =  (4  +  4  +  tyhCNOT^)  +  (21  +  7) vCNOT, \m_1}  +  7tf(m_1)  (4a) 

+  18  hSW  AP(m_i)  +  15t>SWAP(m_1)  +  8P|+)(m_1)  +  12P|o)(m-i)  +  20Adx(m-i) 

tirne(£C(m) )  2?nu:r(P|_|_^m_i),  P|o)(m— i)) 

-f  ‘2max(P\^(rn_i') ,  P|o)(m— i) >  hCNOT^m_ i)  i  vC N OT^m—  1) ) 

+  2  max(hSW  AP(m_1} ,  vSW  AP(m_1) ,  vCNOT(m_  1} ,  P|0)  (m_  1) ) 

+  2  max{hSW  AP(m_x)  +  hCNOT(m_1'),vCNOT(m_1'))  (4^) 

+  2max(hSWAP{m_1),hCNOT{m_1)) 

+  2max(hSW  AP(m_\) ,  vSW  AP(m_!) ,  hCNOT(m_i) ,  MX(m-i) ) 

+  2  max(hSW  AP(m_x) ,  vSW.  AP(m_i) ,  MX(m- 1) ) 

+  2nCiVOT,(m_1)  +  2Mx(m-i) 

Error  detection  for  a  single  logical  code  block: 

Ops[£T)  (m))  =  ££(m) 

tim.e{£V{m))  =  £C{m) 

Fault-Tolerant  Horizontal  and  Vertical  CNOT  Gate 

ops(hCNOT(m))  =  7nCVOT(m„1)  +  U2hSWAP(m_1)  +  UvSWAP, +  £C(m)  (6a) 

ops{vCNOT(rn) )  =  rivCNOT(m_1)  +  12  hSWAPim_1}  +  70  vSWAP{m_x)  +  £C(m)  (6b) 

time{hCNOT(m))  =  max(hSW AP(m_ip  vSW AP(m_i))  +  6hSW dP(m_!)  (6c) 

-f-  vCNOT^m_1^  -f-  max(P\+'j(rn_i') ,  P|o)(m— 1) ?  hSW  AP^m_  1) ) 

+  ?naa;(P|+)(m_1),  P|o)(m-i)>  hCN  OT^m_l^),vCN  OT^m_^,  hSW  AP(m_i )) 

-f-  ??m:r  (P|^q(m_i) ,  P|o)(m— 1) )  A  n'mx(P|_(_^  (m— 1) ,  P|o)  (m— 1)  5  hCNOT^rn_ 1)  5  vC NOT^m—  1) ) 

+  2max(hSW  AP(m_1} ,  uSW.  AP(m_1} ,  vCNOT^y ,  P|0>(m-i) ) 

+  2max(/iSWAP(m_1)  +  hCNOT^^pvCNOT^^) 

+  2max(hSW  AP(m_i),  hCNOT^m-i)) 

+  2rnaa;(/iS'lVAP(m_1),n6'lTAP(m_1),/iC'VOT(TO_1),  Atx(m-i)) 

+  2maa;(/iS'l/LAP(m_1),n/S'lTAP(m_1),  Mx(m-i))  +  2  nCiVOP(m_1)  +  2A4x(m-i ) 


(5a) 

(5b) 
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time(vCNOT(m'))  =  max(vSW  AP^m_i),  hSW (6d) 
+  2max(vSWAP{m_1),vCNOTirn_1))  +  4  vSWAP(m_1} 

+  max  (Pj  +)  (m- i)  ,  P\  o>  (m- 1) ,  hSW  AP(m_  ,  vSW  AP{m_  1} ) 

+  max  (Pj  +)  (m_i) ,  Pj  o)  (m-i) ,  hC. NOT^m_  x) ,  vCNOT, (m-i) ,  hSW  AP(m_  ) 

P  max (-P|_(_^m_i) ,  -P|o)(m— i) )  P  max(P\  ^ (m—  i ) ?  -^lo) (m— i) ?  hCNOT^m_ i) ,  vC N ) 

+  2maa;  (hSW  ,  uSW.  AP(m_1} ,  vC'7V0T(„l_1) ,  P|0>(m-i) ) 

+  2max(hSWAP(m_1)  +  hCNOT^^vCNOT^^) 

+  2max(hSWAP{m_1),hCNOT{m_1)) 

P  2max(hSWAP(m_1),vSWAP(rn_1),hCNOT{m_1),MX(m-i)) 

+  2max{hSW  AP(jn_i'>,vSW  AP{jn_i),A4x(m-i))  +  2  vCNOT(m-i)  +  2A4x(m-i) 


Fault-Tolerant  Horizontal  and  Vertical  SWAP  Gate 

ops(hSWAP{m))  =  56  hSWAP^-i)  +  £C{m)  (7a) 

ops(vSWAP{m})  =  42  vSWAP{m_1}  +  £C(m)  (7b) 

time(hSWAP{m))  =  8/iS'WAP(m_1)  +  £C{m)  (7c) 

time(vSW  AP^)  =  6vSW  AP(m_  i)  +  £C(m)  (7d) 

Fault-Tolerant  Pauli  Gates  and  Hadamard 

Fault-tolerant  implementation  of  the  X  gate: 

ops(X{m))  7X(m_1)  -|-  £C (8&) 
time(A^m^)  -t-  £C^m^  (8b) 


The  properties  of  the  transversal  gates  Y(m),  zqTO),  i7(m)  as  well  as  the  transversal  construc¬ 
tions  Strans  and  Ttrans  are  obtained  by  substituting  for  X  in  the  equations  above. 


Fault-Tolerant  Measurements 

Measurement  in  the  X-basis: 

OpS (Xt x(m) )  7Xtx(m—  1)  P£^(m)  (9a) 

time{MX(m))  =  £C(m)  (9b) 

Measurement  in  the  Z-basis: 

Ops(Mz(m))  =  7A4Z(m-l)  +  £C(m)  (10a) 

time(Mz(m))  ££(m)  A4x(m—  1)  P  ?rux3:(.A/l_x'(m— 1)5  AAz^m—  1))  (10b) 

Fault-Tolerant  Preparation  of  a  Logical  State 


Note  that  the  expected  number  of  gates  required  to  re-initialize  the  ancillas  upon  initialization 
failure  is  negligible,  and,  therefore,  we  ignore  these  operations.  Preparation  of  the  logical  state 

l+): 

pp*(P|+>M)  =  (2  +  2  +  3)hC  NOT(m_i)  +  7vCNOT[rn_1} 

P  7 +  9 hSWAP^i)  P  11  vSWAP^d 
+  4P|+)(m_i)  +6  Pj0)(  m—  1)  H-  X  (m — 1)  H-  (m) 


(lla) 


M.  Suchara  et  al.  21 


time{P\+)(m))  =  max(P|+)(m_i),P|0)(m_i))  (lib) 

+  UlQ,x(P |-|-)(m— Ip  P|0)(m—  l)i  hCNOT(m_ i),  vC !VOP(m_i)) 

+  max(hSW  AP(m_xy,  uS'W/AP(m_1),  vCNOT^^,  P|0>(m-i)) 

+  max(hSW  AP^m_i'j  +  hCNOT^^vCNOT^  1})  +  moa:(/iS,WAP(ro_1),  /iC'A^OT(to_1)) 
+  ma;r  (hSW  AP(m_  1} ,  vSW  iP(m_!) ,  hCNOT(m_ i) ,  MX{m- 1) ) 

“b  THdX^hSW  AP^rn_^  i  V S 11* A P( rn _ i  j ,  A/iX (m— l) )  “b  (rri) 

-  rnaa;(P|+)(m_1),P|0)(m_1))  +  ?naa’(P|+)(m_1)),P|0)(m_1),t)5VFAP(m_1)) 

?TlCl£(P|_|_^m_i)  ,  -P|0)(m—  Ip  hCNOT^m—  Ip  N  OT(^m—  1)) 

“b  max  (P|_jpm_  1)  ?  -P|0)(m— 1)  i  hCNOT^m—  1)  5  vC N OT(m—  1)  5  H(m—  1)  ) 

Preparation  of  the  logical  state  |0): 

ops(P|0>M)  =  (2  +  2  +  Z)hCNOT{rn_x)  +  7r;ClVOT(m_1)  +  9hSWAP, (m_1}  (12a) 

+  llr;S'l/PJ4P(m_1)  +  4P|+}(m_1)  P  6P|o)(m_i)  +  3Alx(m-i)  +  £C(m) 


time(P|o)(m))  roax(P|-|-}(TO_i),  P|o)(m_i)) 

“b  r?rax(P|_(_)(m_ i),  P|o)(m— ip  hPA^OP^m_ip  vCNOT^m—i)) 

+  max(hSW  AP(m_  1} ,  uS'TP  TP(m_  x) ,  vCNOT^m_  i) ,  P|0>  (m- 1) ) 

+  max(/iS!PAP(ro_i)  +  hCNOT^^vCNOT^  1}) 

+  max[hSW  ^4P(m_i),  hCNOT(m_i )) 

+  max[hSW  AP(m_i-),vSW  AP(m_i)ihCNOT(m_1)iMX(rn-i)) 

+  max(hSW AP^^^vSW AP(m_1):  MX(m_1))  +  £C{m) 
-maa;(P|+)(m_1),P|0)(rn_1))  +  ?naa’(P|+)(m_1),P|0)(m_1),?;5'lPAP(m_1)) 


Preparation  of  the  logical  cat  state: 

ops(Pcat{m))  =  GhCNOT, („_!)  +  2vCNOT(jn_i}  P  MSWAP^^ 

+  8l>SWAP(TO_i)  +  P|+)(m_l)  +  7P|o)(m~l)  +  +  £C(m) 

tim.e{Pcat(m ))  =  maa’(f)+)(m_i),  P|0)(m_i)}  +  2vCNOT^m_-^ 

P  3hClVOT(m_i)  +  max(M  Z(m- 1) ,  ft-SW. AP(m_  i) ,  i>SW  iP(m_!)) 
+  2max(hSW  AP(m_i),  vSWAP^  TO_i)) 

+  /iSlWi4P(m_1)  +  r’/S'M/rAP(m_1)  +  £C(to) 

Fault-Tolerant  S  Gates 

ops{S (rn) )  =  P)o)(m)  ~b  4Sfrans(m)  P  2P|capm)  P  14,rG./VOF(m_i) 

+  hCNOT(m )  +  14A4x(m-i)  +  M.Z(m)  +  £C(m) 


(13a) 


(13b) 


(14a) 


tirne{S(m))  =  max{{P\0){m)  +  Strans{m)) ,  P\cat){m))  +  vwx{StranS(m),  P\cat){m)) 

+  2  h,CNOT(m_i)  +hCNOT(m)+2max(Strans(rn'),A4X(m_i-))+Mz(m)  +£C(m) 


Fault-Tolerant  T  Gates 

ops(T(m) )  =  P|o)(m)  P  kTtransfjn)  P  2P|capm)  P  14rGfVOF(m_i)  (15a) 

P  hCNOT(m )  P  14A4x(m-i)  +  A4Z(m)  +  £C(m) 

time{T{m))  =  m.ax{{P\0)(m)+Ttrans(m)),  P\cat){m))  +  m.ax{Ttrans(m),  P\cat){m))  /15b^ 
+  2hC NOT(^m_1)  +hCNOT(jn}  +  2max(TtranS(m),JvlX(jn_i))JrAAZ(rn)  P£C(m) 
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Fig.  10.  3x3  representation  of  the  Bacon  Shor  code. 


7  Error  Correction  with  the  Bacon-Shor  Code 

In  this  section  we  estimate  the  overhead  imposed  by  the  Bacon  Shor  [[9, 3, 1]]  error-correcting 
code.  This  section  is  organized  as  follows.  First  we  describe  basic  properties  of  the  code  and 
the  basic  circuits  used  to  implement  a  universal  gate  set  using  an  ancilla  factory  model,  i.e. ,  a 
zone  of  the  computer  in  charge  of  distilling  high  quality  ancillas  required  to  execute  S  and  T 
gates.  Then  we  build  the  recursive  relations  resulting  from  the  concatenated  code  structure 
in  order  to  quantify  the  total  number  of  gates  and  total  time  required  by  each  elementary 
operation. 

7.1  The  Bacon  Shor  Code 

Logical  qubits  in  the  [[9,3,1]]  Bacon  Shor  subsystem  code  are  encoded  using  nine  qubits.  The 
Bacon  Shor  Code  is  a  subsystem  stabilizer  code  [31]  better  pictured  in  a  3  x  3  array  with 
stabilizers 
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that  map  the  encoded  logical  states  to  themselves.  Logical  X  (or  Z)  Pauli  operators  can  be 
executed  via  a  homogeneous  X  (or  Z )  pulse  modulo  stabilizer  operations,  which  in  particular 
implies  that  logical  X  (or  Z)  operators  can  be  implemented  by  a  homogeneous  action  on  one 
row  (or  column)  of  the  array.  There  is  no  unique  way  of  writing  the  codewords  given  the 
gauge  freedom  induced  by  the  subsystem  structure.  There  exists  a  set  of  gauge  operators 
Og  consisting  of  pairs  of  X  (or  Z)  Pauli  operators  acting  on  the  same  column  (or  row) 
which  commute  with  the  stabilizer  and  logical  operators  and  thus  act  trivially  on  the  encoded 
information. 

The  Bacon  Shor  code  can  be  concatenated,  each  Level  m  encoded  qubit  block  is  built 
using  nine  Level  m  —  1  logical  qubits  and  gates.  Level  1  blocks  are  at  the  lowest  level  of 
encoding  and  they  are  built  of  Level  0  qubits  and  gates  provided  at  the  physical  level.  Each 
of  the  gates  below  show  how  to  execute  an  operation  at  some  level  of  concatenation  in  using 
gates  of  level  m  —  1. 
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Fig.  11.  Illustration  of  the  state  preparation  with  the  Bacon-Shor  code. 


We  will  use  the  same  notation  as  in  the  previous  section.  Namely,  standard  gates,  mea¬ 
surement  operations  and  state  preparation  at  the  m-th  level  of  concatenation  are  denoted  by 
A(m) ,  ^(m)i  G-/VOX[m),  T(my  Nix{m)i  and  The 

error  correction  and  detection  is  represented  by  £C(m)  and  £V(my  Operations  required  to 
implement  a  gate  are  denoted  ops (...),  and  the  required  time  is  time (...). 

First  we  consider  the  preparation  of  encoded  states.  The  circuit  shown  in  Figure  11  shows 
how  to  encode  logical  |0)  and  |+)  at  level  m  of  concatenation,  using  as  resources  gates  at  level 
m  —  1. 

Having  described  how  to  prepare  encoded  states,  let  us  proceed  to  show  how  one  can 
manipulate  them,  and  how  to  implement  a  set  of  universal  gates. 

The  Bacon-Shor  code  belongs  to  the  family  of  Calderbank-Shor-Steane  (CSS)  codes,  a 
family  of  codes  with  transversal  implementation  of  most  gates,  including  the  CNOT  gate. 
Transversal  CNOT  gate  at  Level  m  can  be  obtained  by  applying  nine  CNOT  gates  to  the 
corresponding  control  and  target  qubits  at  Level  m  —  1.  Figure  12  shows  a  fault-tolerant 
implementation  of  the  CNOT  gate.  Gates  that  can  be  performed  transversally  also  include 
the  single  qubit  Pauli  gates.  The  H  gate  is  transversal  modulo  a  7r  rotation  of  the  3x3  array 
along  the  diagonal  of  the  array,  i.e.  a  relabelling  of  the  qubits  qij  O  qri-  Figure  12  also  shows 
how  to  do  measurement. 

The  S  gate  can  be  implemented  using  the  circuit  in  Figure  13.  It  uses  an  ancilla  in  the 
state  |+i)  =  as  a  resource  to  generate  the  required  gate.  In  turn,  an  encoded  |+i) 

state  at  any  levei  of  concatenation  can  be  obtained  via  the  injection  circuit  in  Figure  14,  which 
basically  teleports  an  arbitrary  lower- level  state,  in  this  case  |+z),  into  an  encoded  state  |+i) 
at  the  cost  of  decoding  (via  T>)  an  encoded  Bell  pair.  Moreover,  since  the  injection  circuit  is 
not  fault-tolerant,  a  higher  fidelity  |-M)  has  to  be  distilled  via  multiple  successful  rounds  of 
the  circuit  in  Figure  16. 

In  order  to  have  a  universal  gate  set,  we  also  need  the  n/8  gate  called  the  T  gate.  A 
fault-tolerant  version  of  this  gate  cannot  be  constructed  transversally.  A  fault-tolerant  T  gate 
is  shown  in  Figure  18.  This  gate  sequence  was  originally  constructed  in  [31]  using  one-bit 
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Fig.  12.  Fault-tolerant  implementation  of  the  CNOT  gate  (on  the  left),  the  single  qubit  gates  X , 
Y,  Z  and  H  (top  right  corner),  and  Z  basis  measurement  (bottom  right). 


I*)  J  SI*) 

|+i)  -®f z[  |+i> 


Fig.  13.  Implementation  of  the  S  gate  via  the  use  of  the  |+£)  resource  ancilla. 


+  i 


z—h  \+i) 


Fig.  14.  Injection  of  an  arbitrary  encoded  state.  In  this  figure  the  injection  of  the  |+i)  state  is 
shown. 


teleportation.  The  gate  sequence  teleports  the  state  \if)  from  the  data  block  to  the  ancilla  and 
applies  the  T  gate  to  the  state.  The  ancilla  state  T|+)  is  prepared  using  the  state  injection 
method  described  before,  followed  by  several  successful  rounds  of  the  distillation  circuit  shown 
in  Figures  16  and  19. 

To  correct  errors  during  the  computation,  one  uses  the  circuit  depicted  in  Figure  21.  This 
circuit  corrects  Z  and  X  errors  independently.  In  essence,  the  circuit  extracts  the  measure¬ 
ment  outcomes  of  the  X  and  Z  type  gauge  operators  independently,  and  with  various  classical 
parity  operations  on  the  measurement  outcomes  it  is  possible  to  construct  the  recovery  pro¬ 
cedure  7 Zx  or  1ZZ-  This  is  detailed  in  Ref.  [31] 

Note  that  the  error  detection  operation  denoted  EV  differs  from  the  error  correction  EC 
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Fig.  15.  Decoder  circuit  T>  from  level  s  to  level  s—  1.  Injecting  a  state  at  level  m  requires  a  decoder 
from  level  m  to  level  0. 


1+0 

1+0 


Fig.  16.  Distillation  process  for  |+i)  states.  The  process  is  successful  when  the  measure  outcome 
is  a  0.  If  it  is  a  1,  the  process  must  be  restarted  and  the  states  discarded. 


— 
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Fig.  17.  Twirl  gate  for  the  |+i)  state  shown  in  the  |  +i)  distillation  circuit. 


r|$) 


Fig.  18.  Implementation  of  a  T  gate  using  as  resource  ancilla  the  state  T  |+).  This  state  is  injected 
with  the  circuit  in  Fig.  14.  The  distillation  and  twirl  procedures  are  different  that  those  of  the  |+i) 
state. 


depicted  in  Figure  21  by  removing  the  two  error-correction  operations  IZx  and  IZz- 
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Fig.  19.  Distillation  circuit  for  the  T|+). 


Fig.  20.  Twirl  operation  for  the  T|+)  state. 

7.2  Resources  with  m  Levels  of  Concatenation 

Here  we  estimate  the  resources  needed  to  perform  a  single  logical  gate  at  Level  m  of  concate¬ 
nation.  Specifically,  we  calculate  the  number  of  Level  0  gates,  the  total  number  of  ancillas 
used  during  the  computation,  and  the  total  gate  time  taking  parallelization  into  account.  As 
a  general  rule  we  shall  take  into  account  the  presence  of  the  input  and  output  error-correction 
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Fig.  21.  Illustration  of  the  Steane-EC  syndrome  extraction  method. 


routines  inserted  in  any  gate,  and  this  should  be  understood  for  every  circuit  shown  above.  For 
example,  gate  17(m)  at  level  m  is  implemented  as  the  sequence  £C(m)|inputs^(m)£C(m)|outputs- 
Moreover,  when  one  has  multiple  gates  in  action  A^B^my..  each  gate  is  assumed  to  be  of 
the  above  foim,  i.e.,  (£tqm)|inpUtsA^rrq£,£qm)|outputs)(^,^'(m)|inputs-^(m)^,^'(m)|outputs) —  Since 
EC  routines  acting  back-to-back  are  only  detrimental,  i.e.  same  correction  effect  but  using 
more  gates,  they  can  be  contracted  yielding  EC (m)A(m)£C While  this  con¬ 
traction  process  can  be  executed  for  every  level  of  concatenation  below  the  one  analyzed  in 
order  to  reduce  the  number  of  gates,  here  we  opt  to  contract  only  at  the  analyzed  level. 
Measurements  only  have  input  EC  routines,  while  preparations  typically  have  only  output  EC 
routines.  Furthermore,  we  will  account  for  the  fact  that  the  two-qubit  interactions  must  be 
nearest-neighbor  only,  and  our  resource  estimation  counts  the  necessary  SWAP  gates. 
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7. 2. 1  Physical  Layout  of  Qubits 

In  order  to  appropriately  quantify  the  resources  required,  we  describe  the  physical  layout 
being  used.  We  use  the  tile  structure  used  in  Ref.  [12],  designed  to  minimize  the  amount 
of  SWAP  operations  used  during  error-correction  routines,  and  thus  preserve  a  high  error 
threshold.  Extra  qubits  are  placed,  labelled  by  a,  in  order  to  account  for  nearest  neighbour 
only  interactions  in  the  following  fashion 
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For  details  on  the  exact  gate  sequences  being  used  we  refer  the  reader  to  the  original  paper  [12] . 
The  threshold  obtained  for  this  layout,  and  the  corresponding  circuits  implementing  the  error- 
correction  subroutines,  is  pth  =  1-3  x  ICC5. 

7.2.2  Ancilla  Factory 

We  will  use  what  is  known  as  the  ancilla  factory  model,  in  which  sections  of  the  computer  are 
devoted  exclusively  to  producing  and  distilling  the  special  ancillas  required  to  implement  the  T 
and  S  gates  at  the  highest  level  of  concatenation.  At  this  stage,  we  assume  that  all  distillation 
rounds  are  successful  while  recognizing  that  the  distillation  process  is  a  probabilistic  process. 
Accounting  for  the  probabilistic  character  of  the  distillation  process  can  be  done  once  the 
success  rate  and  target  fidelity  of  the  distilled  states  are  fixed,  on  a  case  by  case  basis.  This 
finer  analysis  is  not  done  in  this  report. 

In  order  to  determine  how  many  successful  distillation  rounds  are  required,  we  take  into 
account  the  error  rate  of  the  physical-level  input  state  and  physical  gate  error  rates  (bounded 
by  p ^ )  to  compute  the  output  error  rate  after  r  successful  rounds  of  distillation.  The  rmax 
for  which  the  output  error  rate  is  below  the  error  rate  of  Clifford  gates  at  the  highest  level  of 
concatenation  L,  given  by  the  well  known  equation  [2],  p^Ll  =  pth  (p^/Pth)  ,  is  the  one  we 
use  in  our  simulations.  We  find  that  rmax  =  5  and  rmax  =  3  distillation  rounds  are  required 
for  T  |+)  and  |+z)  ancillas  respectively. 

7.2.3  The  Recursive  Relations 

We  express  these  resources  for  a  single  logical  gate  at  Level  m: 

Fault-Tolerant  Measurements 

Measurement  in  the  Z-basis: 


Ops(Mz(m))  —  9  MZ(m^h)  +£C(m) 
time(Mz(m))  Afz(m— i)  T 


(17a) 

(17b) 
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Measurement  in  the  X-basis: 


Ops(Nlx(m)')  9MX (m_l)  ~b  (18a) 

tiTnC^AA x (m))  =  Mx(m—  1)  ~b  ££(m)  (18b) 

Fault-Tolerant  |0)  and  |+)  Preparation 

Preparation  of  |0)  states: 


ops(P\o)(m))  —  24P|0)(m_1)  +  12P|+)(m_1)  +  51 CN  OT^m_i~)  P  £C(m)  (19a) 

tfme(P|o)(m))  =  max(P|o)(m_i),  P|+)(m_i))  +  max(P|0)(m_1),  ClVOT(m_i)) 

+  CNOT(m_ i)  +  max(Mz(m_1),  CiVOT(m_i))  +  max(P|0)(m_1), SWAP(m_i))  (19b) 
+  2SW iAP(m_1)  +  CNOT(m_i)  P  MX(m- 1)  +  £C(m) 

Preparation  of  |+)  states: 


°Ps(-P|+}(m))  ^  24P|+)(m_^  +  12P|o>(m_1)  +  51  CIV OT(m_1}  +  £C(m)  (20a) 

time(Pj+)(m))  =  max(P|0)(rra_i),P|+)(m_1))  +  max(P|0>(m_i),  CfVOT(TO_i)) 

+  CArOT(m_1)  +  max(Mz(m_i), CNOT(jn_i))  P  max(P|o)(m-i),  SWAP(m_i))  (20b) 

+  2<S'T/FAP(m_1)  +  CNOT(m  —  1)  +  MX(m- 1)  +  £C(m) 

Fault-Tolerant  Pauli  Gates 

Fault-tolerant  implementation  of  the  X  gate: 


ops{X(m))  i)  P  £C(m)  (21a) 

tirne(X(m) )  X^m_  i)  P  £C^m^  (21b) 

Fault-tolerant  implementation  of  the  if  gate: 

Ops{Z(m) )  —  3Z(m—  1)  P  ££(m)  (22a) 

ti'fiic(Z (Tn^ )  ^(m— l)  P  (22b) 

CNOT  Gates 

Fault-tolerant  implementation  of  the  CNOT  gate  (horizontal  and  vertical): 


ops(CNOT{m) )  =  9C7VOr(m_i)  P  144SWAP(m  -  1)  P  £C(m) 
time{X(jn ))  =  CNOT(m_\}  P  8iSWAP(m_i)  P  £C(m) 


(23a) 

(23b) 
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Fault-Tolerant  H  Gate 

Fault-tolerant  implementation  of  the  H  gate: 


ops(if(m))  —  +  4QvSW  4-  2£C(m)  (24a) 

time(H(m) )  =  a)  +  40SW  AP^m_^  +  3  £C(m)  (24b) 


SWAP  Operation 

The  swap  operations  hSW AP(rnj  and  t;SWAP(m)  swaps  one  of  the  28  sub-blocks  inside  a 
single  m-th  level  block  with  another  of  the  28  sub-blocks  that  is  adjacent  on  the  right/left 


and  top/bottom  respectively: 

ops(hSWAP{m) )  =  lMSWAP^-v  +  £C(m)  (25a) 

ops(vSWAP(m) )  =  +  £C(m)  (25b) 

time(hSWAPim))  =  8SWAP(m_1}  +  £C{m)  (25c) 

time(vSWAP(m))  =  %SWAP{m_x)  +  £C{m)  (25d) 


Error  Detection  and  Correction  with  Steane-EC-type  syndrome  extraction 

A  fault-tolerant  implementation  of  quantum  error-correcting  protocol  for  a  single  logical  code 
block  : 


ops(£C (rn) )  —  9P|+)(m_ i)  4-  9P|o)(m— i)  P  ‘2QCNOT(m_i'j  (26a) 

+  9Mx(m_i)  +  9MZ(m_i)  +  12SWAP(m)  +  X^m_1-)  +  Z(m_1~) 

time(£C(m) )  =  rnax(P|+)(m_1)P|0)(m_1)) 
+max(P|+)(TO_1),P|0)(TO_1),C'AOT(TO_i))+max(S'WAP(m_1),GAOT(TO_1)) 

+  ma x(SWAP(m_1),CNOT(m_1),Mx(m_1))  (26b) 

+  ma x(hSWAP(m_1'),CNOT(m_1),MX(:m-i),Mz(m-1)) 

+  max(GAOT(m_1),  MX(m-i),  MZ(m-i)) 

4”  AIx(m—i)  4”  max(A(m_ i),  Z^rn_ i)) 

Error  detection  for  a  single  logical  code  block: 

ops(£V(m ))  =  9P|+)(m_1)  +  9P\o)(m-i)  4"  29 CN OT^m_1')  (27a) 

4-  9MX(m_i)  4-  9 MZ(m_i)  +  125'T/FAP(m) 

time(£V(m ))  =  max(P|+>(m_1)Pj0)(m_1)) 
4-max(P|+)(TO_1),P|0)(TO_1),GAOT(m_i))+max(S'WAP(m_1),GAOT(TO_1)) 

+  max(5WAP(m_1),  CNOT{m_lh  AfY(m_1})  (27b) 

4-  max(5WAP(TO_1),  GAOT(m_1),Mx(m-1),  -Mz(to_!)) 

+  max(C'AOT(m_i),  MX(m-i),  MZ(m-i))  +  MX(m- 1) 
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In  order  to  complete  a  universal  gate  set  we  are  missing  a  fault-tolerant  implementation  of 
the  S  and  T  gates.  The  implementation  these  gates  both  require  special  resource  ancillas.  Let 
us  first  do  the  counting  for  the  circuits  implementing  the  gates  assuming  the  corresponding 
ancilla  is  provided. 

Fault-Tolerant  S 

Fault-tolerant  implementation  of  the  S  gate: 


ops(S{m))  =  P]+iKm)  +  CNOT(m)  +  CPHASE{m)  (28a) 

Ume{S(m))  =  P\+i)(m)  +  CNOT{m)  +  CPHASE(m)  (28b) 

Here  CPHASE (TO)  denotes  the  control  phase  gate.  This  two-qubit  gate  can  be  obtained  con¬ 
jugating  a  CNOT  gate  with  a  Hadamard  gate  acting  on  the  target  of  the  CNOT  gate. 

Fault-Tolerant  T 

Fault-tolerant  implementation  of  the  T  gate: 

Ops{T(m))  =  PT\+Um)  +  CNOT(m)  +  MZ{m)  +  S(m) 
time(T(m))  =  PT\+),(m)  +  CA TOT^  +  Mz(m)  +  S(m) 

where  T  |+)  =  Rz( n/8)  |+). 

Both  states  can  be  injected  via  the  circuit  in  Figure  14.  The  gate  count  for  such  circuit  is  as 
follows: 

Injection  circuit  for  a  state  |\I/) 

Implementation  of  the  T)  injection  circuit: 


(29a) 

(29b) 


ops(lnj^)  —  P\+)}(m)  +  P\0),(m)  +  P\'S!),{0)  +  CNOT(m) 

+  ^(m,  0)  +  MZ(0)  +  MX(  o)  +  +  Z(TO) 

iwne(Inj*)  =  rnax(P|+))(m),P|0)i(m),P|^)i(0))  +CNOT{m)  +V(mfi) 
+  CNOT(  o)  +  max(MZ(0),  MX(o))  +  +  Z(m ) 


(30a) 

(30b) 


Decoder  circuit  T>(m,o) 

Such  circuit  can  be  implemented  in  a  level-by  level  fashion,  namely: 

P{mp)  ~  T\l,0)  l,m— 2) —  1)  • 

Implementation  of  the  gate: 


ops(T>(s,s-i))  —  S>CNOT^s_i')  +  2MZ(S_1)  +  6Mx(s_i) 
time(V(S'S_  i))  =  2,CNOT^_i)  +  max(Mz(s_1),  Mx(s_i)) 


(31a) 

(31b) 
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Since  the  injection  circuit  is  not  fault-tolerant,  one  must  further  improve  the  quality  of 
the  injected  states  by  a  distillation  process.  Higher  fidelity  resource  states  are  obtained  con¬ 
ditioned  of  successful  distillation  procedures.  Here  we  do  the  gate  count  of  a  single  distillation 
step  for  each  of  the  two  resource  states. 

Distillation  circuits  for  |+«)  and  T|+) 

This  circuit  assumes  the  input  of  noisy  ancillas  obtained  via  Inj|+iwTOp  and  thus  we  do  not 
count  this  gate  as  part  of  the  circuit. 

Implementation  of  the  Dist|+iy(m)  circuit: 


ops(Dist|+j)i(m))  =  2  ( twirl  -  |+*))(m)  +  CPHASE (m)  +  CNOT(m)  +  MX(m)  (32a) 
time(Dist|+i)i(m))  =  ( twirl  -  |+*))(m)  +  CPHASE(m)  +  CNOT(m )  +  MX(m)  (32b) 

Implementation  of  the  ( twirl  —  |+i))(m)  circuit: 

ops((twirl  -  |+*))(m))  =  MZ(o)  +  Y(m)  (33a) 

time(  Dist|+i)i(m))  =  Mz{m)  +  Y{m)  (33b) 

where  Y  =  iXZ. 

This  circuit  assumes  the  input  of  noisy  ancillas  obtained  via  InjT|+wmp  and  thus  we  do  not 
count  this  gate  as  part  of  the  circuit. 

Implementation  of  the  DistT|+>,(m)  circuit: 


ops(DistT|+))(m))  =  15  ( twirl  -  T  |+))(m)  +  32 CNOT{m)  +  14 Mx(m)  (34a) 

time(DistT|+>,(m))  =  ( twirl  -  T  |+))(m)  +  21  CNOT(m)  +  MX(m)  (34b) 

Implementation  of  the  ( twirl  —  T  |+))(m)  circuit: 

ops  ((twirl  -  |+i))(m))  =  Mz(0)  +  5(m)  +  X{rn)  (35a) 

time(Dist|_|_i)j(m))  =  Mz^m)  ^(m)  -^(m)  (35b) 

With  these  recursive  relations,  the  resource  count  can  be  carried  out. 

8  Error  Correction  with  the  Knill’s  Post-Selection  Scheme 


Now  we  analyze  the  resource  estimates  of  the  quantum  computer  using  the  Knill’s  post¬ 
selection  scheme  [32,33].  More  details  can  be  found  in  [34].  The  Knill’s  post-selection  scheme 
concatenates  (L  —  1)  levels  of  an  error-detecting  code  Ced  with  an  error-correcting  code  Cec 
at  the  top-level.  We  use  6™  to  denote  the  Level-m  encoding  of  the  Ced  code. 
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8.1  The  Knill’s  Post- Selection  Scheme 

The  error-detecting  code  Ced  is  the  [4,  2,  2]  code  with  a  stabilizer  group  generated  by  XXXX 
and  ZZZZ.  This  quantum  code  encodes  two  logical  qubits  and  can  simultaneously  detect 
any  single  qubit  X  error  and  any  single  qubit  Z  error.  In  the  Knill’s  scheme,  we  use  only  one 
of  the  logical  qubits  and  treat  the  other  as  a  spectator  qubit.  The  logical  operators  are 

XL  =  XXII, 

ZL  =  ZIZI, 

Xs  =  IXIX, 

Zs  =  II  zz, 

where  L  and  S  are  labels  for  the  logical  and  the  spectator  qubits,  respectively.  Thus 

Yl  =  iXLZL  =  YXZI. 


The  state  |0)L  |+)s  and  |+)L  |0)g  correspond  to  the  logical  |D)  and  logical  |+),  respectively. 
These  states  can  be  fault-tolerantly  prepared  by  the  circuits  in  Figure  22. 

To  perform  fault-tolerant  error  detection  ( £V )  of  C™d,  the  circuits  in  Figure  23  are  used 
depending  on  the  state  of  the  spectator  qubit.  We  choose  £V o  or  £V+  when  the  spectator 
qubit  is  +)s  or  |0)g,  respectively.  Thus  the  state  of  the  spectator  qubit  alternates  between 
|+)s  and  |0)s  after  each  error  detection  block.  According  to  [32,33],  the  £V0  gate  is  better 
suited  for  detecting  Z  errors,  while  the  £T>+  gate  is  better  suited  for  detecting  X  errors. 
If  there  are  no  errors  detected,  the  measurement  outcomes  of  the  the  first  two  code  blocks 
determines  the  logical  Pauli  operator  that  need  to  be  applied  to  the  second  ancilla  block  to 
complete  the  teleportation  (this  is  not  shown  in  Figure  23). 
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Fig.  22.  Fault-tolerant  preparation  of  the  logical 


The  top-level  error-correcting  code  Cec  can  be  the  Steane  code  or  the  Bacon-Shor  code, 
and  we  choose  the  Steane  code  in  this  work.  The  logical  |0)  of  the  concatenation  of  C™d  and 
Cec  is  generated  by  the  circuit  in  Figure  24.  If  an  error  is  detected  at  any  error  detection  step 
at  any  level  of  concatenation,  the  preparation  of  the  |$0)  should  be  restarted.  We  use  Knill’s 
syndrome  extraction  method  shown  in  Figure  25  at  the  top-level  of  concatenation. 

The  logical  CNOT  gate  between  different  code  blocks  can  be  done  transversally  by  ap¬ 
plying  bitwise  CNOT  gates.  The  SWAP  of  qubits  2  and  3  implements  the  SWAP  of  the 
logical  qubit  and  the  spectator  qubit.  We  call  this  the  inner  SWAP ,  which  is  different  from 
the  outer  SWAP  between  two  code  blocks.  The  logical  Hadamard  gate  is  implemented  by 
transversally  applying  the  Hadamard  gates,  followed  by  an  inner  SWAP.  The  inner  SWAP 
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Fig.  23.  Circuits  for  fault-tolerant  quantum  error  detection.  Top:  ST> q.  Bottom:  £D+. 


does  not  need  to  be  applied.  Instead,  we  switch  the  labels  of  the  qubits  and  keep  track  of 
them.  We  assume  this  can  be  done  efficiently. 

To  achieve  universal  quantum  computation  with  the  Knill’s  post-selection  scheme,  it  re¬ 
mains  to  prepare  the  the  +1  eigenstate  of  Y  |+i)  =  (|0)  +  *  |l))  and  the  magic  state 

T  |+)  at  level-  (L  —  1).  We  use  the  ancilla  factory  of  the  Bacon-Shor  code  as  described  in  the 
previous  section.  The  only  difference  is  the  decoding  operation  in  the  injection  circuit,  which 
is  shown  in  Figure  26  for  the  C4  code. 


8.2  The  Qubit  Layout  of  the  C4  Code 

Herein  we  describe  the  physcial  qubit  layout  for  the  Knill’s  post-selection  scheme  and  estimate 
the  number  of  physical  gate  operations  and  time  required  for  each  logical  operation.  As  in  the 
previous  sections,  a  gate  operation  at  level  m  is  followed  by  an  error-correction  (detection) 
routine  at  level  to. 

Following  the  tile  structures  presented  in  [12,35],  we  design  a  2-dimensional  5x5  lattice 
architecture  of  physical  qubits  to  represent  a  logical  qubit  of  the  C4  code.  A  tile  is  initialized 
as  one  of  the  following  two  structures: 
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Fig.  24.  Preparation  of  Cec  encoded  logical  state  0)  using  Ced  code  blocks. 


Fig.  25.  Circuit  for  the  Knill  syndrome  extraction. 


The  four  data  qubits  of  the  C4  code  are  denoted  by  di,  <^3,  and  cLp  The  O’ s  are  dummy 
qubits  used  for  swapping  with  the  data  or  ancilla  qubits  in  communication  or  for  the  ancilla 
preparation  and  their  states  are  irrelevant  to  computation.  Each  qubit  in  the  tile  is  encoded 
in  a  lower-level  tile  structure. 

We  have  the  following  operations  of  the  C4  code: 

1.  Error  detection  (ED) 

2.  horizontal  and  vertical  CNOT  gates  (hCNOT/vCNOT) 

3.  horizontal  and  vertical  SWAP  gates  (hSWAP/vSWAP) 

4.  Measurement  in  the  X  basis  or  the  Z  basis  (Mx  and  M z ) 

5.  The  Pauli  operators  X ,  Y ,  Z1  and  the  Hadamard  gate  (H) 
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Fig.  26.  The  decoding  circuit  for  Ced . 


6.  Preparation  of  the  ancilla  qubits  |+)  or  |0)  (-P|+)  and  P|0)) 

7.  The  phase  gate  S  and  the  n/8  gate  T 

For  simplicity,  all  lower-level  gates  are  assumed  to  take  one  time  step ,  which  is  the  longest 
execution  time  among  all  gates.  In  reality,  we  may  think  that  a  qubit  idles  for  the  rest  of 
the  time  step  after  it  passes  a  fast  gate  and  the  error  rate  of  this  operation  is  the  physical 
gate  error  rate  plus  the  memory  error  rate  for  the  idle  time.  We  try  to  optimize  the  gate 
operations  in  the  numbers  of  SWAPs,  idle  qubits,  and  time  steps. 

Detailed  illustration  of  the  movement  and  operations  inside  the  tile  is  shown  in  Appendix 
17.1.  Let  ops(Gate(m))  denote  the  gate  operation  of  the  concatenated  code  at  level  m  (m  —  1 
levels  of  Ced  and  one  level  of  Cec),  and  ops(Gatej^)  denote  the  gate  operation  of  the  C4  code 
at  level  m.  Similarly  for  time( Gate(m)))  and  time( Gate?k)). 

8.2.1  Error  Detection 

We  first  consider  the  error  detection  circuit  £V+  in  Figure  23.  £T>q  behaves  in  the  same  way. 


ovs(£Vc^)  =  +  4Pg(m_1)  +  6vCNOTfa_1}  +  GhCNOT g_1}  ^ 

+  4  +  4M^m_1}  +  4  vSWAPfc^  +  4  hSWAPfa_1} 

time(£Vf*m))  =  ma  x(P^{m_1),P^){m_1)) 

+  +  vCNOT g_1}  +  hCNOT^_x)  (36b) 

+  m^vCNOT^hCNOT^)  +  vSWAPft^  +  hSWAPft^ 

We  find  the  operation  and  time  of  £T> 0  are  the  same  as  those  of  £V+  and  we  omit  the 
subscripts  0  or  +. 

8.2.2  Fault-Tolerant  Horizontal  and  Vertical  CNOT  Gate 

ops(vCNOT gj})  =  4 hCNOTfc^  +  8 hSWAPft^  +  40 vSWAPft^  +  2£Vf^  (37a) 

timeivCNOTfa)  =  hCNOT g_1}  +  4  vSWAPft^  (37b) 

+  2ma X(vSWAP^_1),hSWAP^_1))+£V^) 

ops(hCNOT g})  =  4 vCNOTfa_x)  +  8 vSWAPfa^  +  40 hSWAPfr^  +  2fpg}  (37c) 
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time(hCNOT^m))  =  vCNOT g_1}  +  4  hSWAPfa^  (3?d) 

+  2maX(^APg_1),/lS^Pg_1))  +  £X>g} 

8.2.3  Fault- Tolerant  Horizontal  and  Vertical  SWAP  Gate 

For  a  SWAP  operation  to  be  fault-tolerant,  we  only  swap  a  data  or  ancilla  qubit  with  a 
dummy  qubit. 

Suppose  SW AP^ o)  is  the  swap  of  two  physical  qubits  on  any  two  adjacent  qubits  (hori¬ 
zontal  or  vertical).  Since  we  only  swap  a  data  or  ancilla  qubit  with  a  dummy  qubit,  only  one 


tile  is  followed  by  error  correction. 

opsivSWAPfa)  =  20  vSWAPfc^  +  £V^  (38a) 

timeivSWAPfa)  =  5  vSWAPfc^  +  £V^  (38b) 

Similarly  for  hSWAP: 

ops(hSWAP{ g})  =  20  hSWAP^_1}  +  evfa  (39a) 

time(hSWAP^)  =  5hSWAP^_1}  +  £V[ ^  (39b) 

8.2.4  Fault- Tolerant  Measurements 
Measurement  in  the  X-basis: 

ops{Mcx\m))  =  4  +  £©<*  =  4”*M$0)  +  (40a) 

time{M%m])  =  M%Q)  +  £VC{^  (40b) 


Similarly  for  measurement  in  the  Z-basis: 

0Ps{MCz\m))  =  4  =  4"*M^0)  + 

time(M  Z(m))  =  ^z(o)  +  ^(m) 

8.2.5  Fault- Tolerant  Pauli  Gates  and  Hadamard 
Fault-tolerant  implementation  of  the  X  gate: 

ops(Xfc))  =  2^_d  +  (42a) 

time(Xg))  =  Xg+^)  (42b) 

Fault-tolerant  implementation  of  the  Z  gate: 

ops(Z^)  =  2Zfc_1}  +  evf^  =  2  "*Z§  +  (43a) 


(41a) 

(41b) 
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time{Zfc))  =  Z%)+£Vc^)  (43b) 

Fault-tolerant  implementation  of  the  Y  gate: 

ops(Yfa)  =  Xfc_1}  +  Zfa_1}  +  y£_1}  +  £Vfa 

=  +  Zgp  +  Y(%_1}  +  £Vfc}  (44a) 

=  (2m  - 1)  (*§  +  Z$)  +  rgf  +  £V&} 

time(Yg))  =  Yg+£-Df*)  (44b) 

Fault-tolerant  implementation  of  the  H  gate: 

OPS(H^)  =  4Fg_1}  +  EVfa  =  4  +  £V^  (45a) 

time(Hfa))  =  H$)+£Vfa)  (45b) 

8.2.6  Fault-Tolerant  Preparation  of  Logical  States  P\o)and,P\+ \ 


ops{P$(m))  =  2P|o)(m-i)  +  2P|+)(m-i)  +  2  hCNOTf^ 

+  4  hSWAPft_1)+£'D°£) 

time(P\C0f{m))  =  ?naa;(P|g(m_1),  P|g(m_1))  +  hCNOT g_1} 
+  hSWAPfc_  i)+^g) 


o^(Pg)(m))  =  +  2^|+4>(m-i)  +  2vCNOT^_1) 

+  4  vSWAP^_1)+£Vf^ 

time(Pf+){m))  =  max(Pg)(m_1),P|g(m_1))  +  vCNOT^_1} 
+  vSWAPfc_  D+fPg) 


(47a) 


(47b) 


5.  £.7  Fault- Tolerant  S  andT  Gates 

For  universal  quantum  computation,  we  have  to  include  the  implementation  of  S  gate  and 
T  gate.  These  two  encoded  gates  of  the  C4  code  cannot  be  fault-tolerantly  prepared  and  we 
adopt  the  ancilla  factory  method  as  in  the  previous  section.  The  only  difference  between  the 
ancilla  factories  of  the  Bacon-Shor  code  and  the  C4  code  is  the  decoding  circuit  as  shown  in 
Figure  26.  The  state  of  the  spectator  qubit  determines  which  one  of  the  two  decoding  circuits 
is  applied.  We  use  the  circuit  recursively  to  decode  a  logical  state  at  level  0  from  a  logical 
state  at  level  m.  Since  only  the  information  matters,  we  don’t  have  to  swap  the  qubits  back 
to  its  original  locations. 
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oPs{dec^){m))  =  decg(m_1}  +  2h.CNOT^_1) 

+  vCNOTfc^  +  4  hSWAPfa_1}  +  2  vSWAPft^ 

(48a) 

«me(decg(m))  =  decg(m-1)  +  hCNOT g_1} 

+  vCNOTfc^  +  vSWAPft^  +  hSWAPfc_x) 

(48b) 

ops{dec^){m))  =  dec j;4)(m_1}  +  2vCNOT^_1) 

+  hCNOT g_1}  +  4uSWMPg_1}  +  2hSWAPfa_1) 

(49a) 

tirne(decf*){m))  =  dec^)(m_1}  +  vCNOT^_1} 

+  hCNOT g_1}  +  ^WMPg_1}  +  vSWAPfc^ 

(49b) 

Now  assume  the  |+i)  and  T  |+)  can  be  efficiently  prepared  and  transported  to  the  destination. 

ops(S^)  =  4  hSWAPft^  +  20  vSWAPft^  +  ShCNOT g_1}  +  2  (50a) 

time(S^)  =  hSWAPft^  +  5  vSWAPfc_1}  +  2hCNOT$_1)  +  2  Hfa  (50b) 

Note  that  an  error  correction  follows  the  last  H  gates. 

0Ps{T^)  =  8  hSWAP^_x)  +  20  vSWAPfc_1}  +  AhCNOT g_1}  +  Mcz\m)  +  (51a) 

ttme(Tg))  =  2hSWAPfc_1)  +  5  vSWAPft^  +  hCNOT g}  +  A^m)  +  S<*  (51b) 

Note  that  an  error  correction  follows  the  S  gate. 

Remark:  It  is  possible  to  combine  a  gate  operation  with  the  error  correction  and  save 
several  time  steps. 

8.3  Resources  with  the  Concatenation  Code 

Here  we  estimate  the  resources  needed  to  perform  a  single  logical  gate  in  the  concatenation 
of  the  Steane  code  with  (L  —  1)  levels  of  the  C4  code.  We  apply  the  recursive  relations  of  the 
Steane  code  but  with  the  lower-level  gate  operations  of  the  concatenated  C4  code. 

Fault-Tolerant  Measurements 

Measurement  in  the  Z-basis: 


ops(MZ(L ))  —  +£C(L) 

(52a) 

time{Mz{L) )  =  M Z*L_1}  +  £C(L) 

(52b) 

Measurement  in  the  X-basis: 


ops(MX(L ))  —  i)  + 


(53a) 
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time(MX(L ))  =  Mx\l- i)  +  £C(l)  (53b) 

Fault-Tolerant  Pauli  Gates,  H  Gates 

Fault-tolerant  implementation  of  the  X  gate: 

ops(X{L))  =  7Xg_1}  +  £C(l)  (54a) 

time{X{L))  =  Xg_1}  +  £C{l)  (54b) 

Fault-tolerant  implementation  of  the  Z  gate: 

ops(Z(L])  =  7  Zft_1}  +  £C{l)  (55a) 

time(Z{L ))  =  Z°*_x)  +  £C(L)  (55b) 

Fault-tolerant  implementation  of  the  Y  gate: 

ops(Y(L))  =  7Yfc_x)  +  £C{l)  (56a) 

tim.e(Y{L ))  =  YfLA_x)  +  £C(L)  (56b) 

Fault-tolerant  implementation  of  the  H  gate: 

ops(H{L))  =  7  H^_x)  +  £C(l)  (57a) 

time{H{L))  =  Hg_x)  +  £C{L)  (57b) 

S  and  T  gates 

ops(S(L))  =  PmL)  +  28  5g_1}  +  2  Pcat(i)  +  UvCNOTg^  (5ga) 

+  7  hCNOT^_x)  +  2Mx{l)  +  Mz{l)  +  £C(l) 

time{S(L))  =  max((P|0)(i)  +  S'(^4_1)),  Pcat(L))  +  max(5'^4_1),  Pcat(i)) 

+  2  vCNOTf*_x)  +  hCNOTg_x)  +  2ma x(Sg_1})  M^L_1})  +  MZ(L)  +  £C(L) 

ops{T[L))  =  PmL)  +  28Sg_1}  +  2Pcat(i)  +  14uGiVOr^4_1)  (5gc) 

+  7  hCNOTf*_x)  +  2MX(l)  +  Mz(l)  +  £C{l) 

time(T{L))  =  max{(P\0)(L)  +  T^_x)),Pca t(L))  +  max^jj,  Pcat(L))  ,5gd) 

+  2hCNOT°*_1)  +  hCNOT^_x)  +  2  max(Tg_1) ,  m£4l_1})  +  Mz(i)  +  f  C(L) 
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Fault-tolerant  implementation  of  the  CNOT  gate: 


ops(hCNOT{L))  =  JhCNOT fL4_±)  +  ll2hSWAPt g4_1}  +  UvSWAlf*^  +  2£C(L)(59a) 

time{hCNOT(L j)  =  max(hSW  AP^_1yvSW  AP^‘_1j) 

+  bhSWAPg4^  +  vCNOT g4_1} 

+  P^D’  hSWAPg4^) 

+  max(P|+^L_1^,  P|0j4(i_1p  hCNOT^-^, 

vCNOTfr_1),hSWAP{ g4_1})  +  P$(i_1}) 

+  max(P|<^(i_1),  P|^‘(i_1),  hCNOT^^yvCNOT^-^)  (59b) 

+  2  max(/iSWAi^ 

+  2max(hSWAP(<£4_1)  +  hCNOlf'^yvCNOlf*^) 

+  2  max(/iS’lFAP(^4_1),  hGNOlf*^) 

+  2  max^SWAPg4^,  vSWAP°*_x),  kCNOT°*_iy  MgL_1}) 

+  2  max(/i5'Wi4J^,_1),  vSWAP°*_iy  M%\L_X)) 

+  2vCNOT$_1)+2M°*{l_1) 

ops(vCNOT(L))  =  7vCNOT ^4_1}  +  70uS'VF^P(^4_1)  +  mSWAPg4^  +  2£C(L)  (59c) 


time(vCNOT(L ))  =  max^PWAP^^hSWAP^4^) 

+  2  max(uSW  APf^vCNOT^^)  +  4  *  v5fFAPg4_1} 
H-max^g^^Pg4^^^ 

+  max(Pf)(i_1)Pg4(i_1)/iC7VOr(^_1) 
vCNOTy*^  hSWApCj*_  t } )  +  max(^‘)(i_1)ig(i_1)) 

+  max(P,^  (£_  1}  P^x,- 1}  hCNOT^_^vCNOT^_  ^ ) 

+  2  max^WAPg4,^^^^^ 

+  2max(/iSWAP(J4_1)  +  hCNOlf'^vCNOlf*^) 

+  2max{hSWAPfL4_1)hCNOT°4_1)) 

+  2  ma xihSWAPf^vSWApC^hCNOTfl^MZ*^) 
+  2  ma ^(hSWAP^vSWAP^M^) 

+  2vCNOTg_1)+2M^L_1) 


(59d) 


SWAP  Operation 

We  swap  two  logical  qubits  at  the  top  level  of  the  Steane  code  and  thus  we  need  two  error- 
correction  blocks  at  this  level.  Notice  that  we  do  not  swap  a  data  or  ancilla  qubit  with  a 
dummy  qubit  at  a  lower  level. 
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opsynsvv  At'(L))  =  mnbw 
time(hSWAP{L) )  =  8  hSWAPf*^ 
ops(vSWAP{L) )  =  84wS'W"APg4_1) 
timefaSWAffo)  =  GvSWAPf*^ 


Fault-Tolerant  Preparation  of  a  Logical  State 

The  detailed  tile  operations  are  given  in  Appendix  18. 


}  +  UvSWAPf^  +  2  £C(L) 

(60a) 

+  2  vSWAPf^  +  £C(L) 

(60b) 

+  UhSWApt £_1}  +  2£C(L) 

(60c) 

+  2hSWAP{f_1)  +  £C{L) 

(60d) 

3P, 


c4 


Ops(P\0)(L))  -  4P|0)4(i_1)  -T  u.  |+)(i_1) 


5vCNOT 


9  hSWAPf* 


+  AhCNOTf*^ 

1  ZvSWAPf* 


time(Pl0){L) )  =  ma x(Pg(L_1),P|gl(L_1)) 

+  max(/lClVOTg(i_1),r;CPVO^(i_1))  +  /iCAWg^ 

+  2  max(vCN  OT^  (L_1} ,  /iSW  AP^  i_1) ,  vSW  AP^(  L_  1} ) 

+  3max(™P|^_1),^lf4P|^(t_1))  +  /iSWMPg(i_1} 

The  tile  operations  of  preparation  of  the  logical  state  |+)  are  omitted. 

ops(P\+){L))  =  3  Pg(L_1}  +  4Pg(^a)  +  ihCNOTf^ 


5vCNOTff_1) 


9h,SW  APfj*1} 


\?yuSWAP^_V) 


(61a) 


(61b) 


(62a) 


(62b) 


time(Pl+HL))  =  max(P|<;4)(i_1),P|J4(L_1)) 

+  m^hCNOT^^vCNOT^^)  +  hCNOT g‘_1} 

+  2max(nC'A^OTj^(i_1),  /iS'W/AP|^(i_1),  rS'lTAP|^l(i_1)) 
+  3max(/lWA^4(L_1),nPTTAP|^4(i_1))  + 


Error  Detection  and  Correction 

We  use  the  Knill  syndrome  extraction  in  Figure  25  instead  of  the  Steane  syndrome  ex¬ 
traction  used  in  the  Steane  code.  Here  we  assume  the  data  tile  |<5)  and  the  two  ancilla  tiles 
|A),  | B)  are  vertically  arranged  in  the  order  |Q) ,  | A) ,  |P). 


ops(EC{L))  =  P]+HL)  +  P|0)(L)  +  UvCNOTf^  + 

+  7Mx\l-  1)  +  7Xl-  1  +  7Z?'U  +  14  vSWAPf^ 

time{EC[L))  =  max(P|+)(i),  P|0>(l))  +  2  vCNOTf*^ 

+  max(Mgi„1)1  M°\l_x))  +  X<£.x  +  +  2vSWAP$_1) 


(63a) 


(63b) 


Remark:  It  would  be  good  if  we  can  squeeze  the  size  of  the  three  tiles  for  error  correction. 
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8.4  Error  Threshold 

Herein  we  estimate  the  error  threshold  of  the  Knill’s  post-selection  scheme  in  the  2-dimensional 
tile  for  the  case  of  stochastic,  adversarial  noise.  Following  the  procedure  presented  in  [11, 12, 
32],  we  count  the  number  of  malignant  pairs  of  locations  in  the  extended  rectangle  of  the 
CNOT  gate.  A  rectangle  of  the  CNOT  gate  consists  of  the  bitwise  CNOTs  on  two  logical 
qubits  followed  by  two  error  detection  EV  blocks.  An  extended  rectangle  of  the  CNOT  gate 
is  a  rectangle  of  the  CNOT  gate  with  two  preceding  error  detection  blocks.  As  shown  in 
Figure  27,  we  assume  the  preceding  EV s  are  EV+ s  and  the  following  EV s  are  EV 0s. 


Fig.  27.  The  extended  rectangle  of  the  CNOT  gate. 

A  set  of  locations  is  called  malignant  if  errors  in  these  locations  could  make  the  calculation 
of  the  rectangle  incorrect.  There  are  seven  types  of  locations  in  a  CNOT  extended  rectangle: 
(1)  P|+>;  (2)  P|0);  (3)  Mx\  (4)  Mz;  (5)  hSWAP/vSWAP ;  (6)  hCNOT/vCNOT;  (7)  idling 
qubits. 

To  obtain  a  higher  error  threshold,  we  optimize  the  tile  operations  of  the  extended  rectangle 
of  the  CNOT  gate  and  the  animation  is  available  online.  There  are  196  locations  in  the 
extended  rectangle  of  the  CNOT  gate:  32  idle  qubits  and  154  gates,  in  which  38  gates  are 
SWAPs.  Here  we  assume  the  error  detection  blocks  begin  before  the  time  step  that  the  data 
qubits  come  in  and  thus  there  are  no  idle  qubits  at  time  steps  1,  2,  and  3  in  the  preceding 
ED.  We  find  that  the  numbers  of  malignant  pairs  of  locations  of  each  kind  are  given  by 

/4  8  8  0  0  32  16  \ 

0  0  14  96  80  32 

16  0  96  104  32 

a  =  16  96  112  32  , 

442  672  268 

322  288 

V  106/ 

where  ci,;j  represents  the  number  of  malignant  pairs  at  locations  of  types  i  and  j. 

Let  c'-  n>  be  the  error  rates  of  type  j  at  level  m.  For  error  correction  to  be  effective,  we 
require 

4"+I)  =  £  «<  A”,4”)  +  o((4£T)  <  4m),  (64) 

i<j 

where  altj  is  the  number  of  malignant  pairs  of  types  i  and  j  and  e™ax  is  the  maximum  of  the 
seven  types  of  error  rate. 

Remark:  in  general  the  error  rate  of  a  SWAP  gate  is  higher  than  a  CNOT  gate  since  it 
is  implemented  by  a  series  of  gate  operations,  such  as  three  CNOT  gates.  However,  in  the 
2-dimension  tile,  we  only  swap  a  data  or  ancilla  qubit  with  a  dummy  qubit  and  the  cost  of 
such  a  SWAP  gate  is  less  than  a  CNOT  gate  as  can  be  seen  in  Appendix  17.1. 

We  assume  all  errors  of  weight  3  or  larger  are  malignant  and  the  effect  of  errors  of  weight 
higher  than  three  can  be  ignored.  (This  might  still  be  an  over  estimate  of  higher-order  terms.) 
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Let  the  ratio  of  the  memory  error  rate  of  the  idle  qubits  to  the  gate  error  rate  as  7.  We  define 
the  number  of  effective  malignant  pairs  as 

6  6 

H  ia-i,7  +  72«r,7- 

i— 1 

If  we  assume  the  error  rates  are  the  same  for  all  types  of  locations,  the  total  number  of 
malignant  pairs  is  2892  and  Eq.  (64)  becomes 

(196)(eW,3  +  2892(eW)2<eMi 
which  gives  an  error  threshold 


e(7  =  1)  <  3.06  x  1CT4. 

If  we  assume  7  =  0.1  or  7  =  0,  the  number  of  effective  malignant  pairs  are  2185.9  and  2118.0, 
and  the  error  thresholds  become 


e(7  =  0.1)  <  4.06  x  1(T4, 


and 

e(7  =  0)  <  4.14  x  10-4, 

respectively. 

8.5  Ancilla  Factory 

We  first  analyze  the  ancilla  factories  for  |+i)  states.  The  analysis  of  the  ancilla  factories  for 
the  magic  state  T  |+)  follows.  Suppose  talc  is  the  error  probability  of  the  output  state  of 
r  rounds  of  distillation  protocol  in  Figure  16  without  the  twirling  blocks.  Let’s  assume  the 
distillation  circuit  is  perfect.  From  [32],  we  have 

Jr)  <  il2e(0)  l2" 

^ anc  —  2  \  ancJ  ’ 

where  ei°Jc  is  the  initial  error  rate  of  preparing  a  |+i)  state.  A  |+«)  state  can  be  obtained 
by  preparing  a  |+)  state  followed  by  a  phase  gate  S.  Thus  ei„c  can  be  approximated  by  the 
sum  of  the  error  rate  of  preparing  |+)  and  the  error  rate  of  a  physical  phase  gate  S,  which  is 
assumed  to  be  a  known  property  of  the  physical  quantum  technology. 

Then  we  choose  the  |+i)  with  error  rate  Carle  such  that  the  output  of  the  injection  circuit 
in  Figure  14  at  the  highest  level  of  concatenation  has  an  error  rate  smaller  than  the  error 
threshold  of  the  quantum  error-correction  scheme.  In  reality,  the  distillation  circuit  is  not 
perfect,  and  we  would  expect  more  rounds  of  distillations  than  the  r  required. 

Now  we  have  the  number  of  the  gates  for  a  successful  preparation  of  a  |+z)  state,  which 
is  the  number  of  gates  in  the  distillation  protocol  plus  the  number  of  gates  in  the  injection 
circuit  Figure  14. 

Since  the  cost  of  an  ancilla  factory  is  much  higher  than  the  cost  of  movement  of  the  |+i) 
states  (roughly  O(104)  based  on  our  numerical  estimates),  we  would  like  to  reduce  the  number 
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of  ancilla  factories  but  to  move  the  |+?)  states  around  the  quantum  computer.  Suppose  there 
are  I\  logical  qubits  in  an  algorithm.  Then  the  size  of  the  quantum  computer  is  roughly  \f~K 
times  the  size  of  a  tile.  Then  the  movement  distance  of  a  |+«)  state  is  at  most  \/K/2  times 
the  size  of  a  tile.  Thus  the  movement  cost  is  about  \jKf 2  SWAP  gates  at  the  highest  level 
of  concatenation. 

Now  we  determine  the  number  of  ancilla  factories  F  we  need.  Assume  a  computational 
step  for  a  combination  of  a  certain  algorithm,  a  quantum  error-correcting  code,  and  a  physical 
technology  takes  time  Tcomp.  A  computational  step  could  be  thought  of  as  the  period  between 
two  phase  gates  are  applied  on  the  same  qubits.  Assume  a  factory  takes  time  Tanc  to  prepare 
a  |+«)  state.  (These  ancilla  qubits  are  prepared  off-line  and  they  are  preserved  and  moved  to 
some  target  locations  by  SWAP  gates.)  Assume  the  maximum  number  of  phase  gates  in  a 
computational  step  is  N.  (This  number  is  called  the  paralleling  factor  of  the  phase  gate  in 
the  algorithm.)  Assume  an  ancilla  factory  prepares  a  |+«)  state  with  successful  probability 
Psucc-  Let  b  be  the  number  of  |+i)  states  successfully  generated  by  F  ancilla  factories.  Note 
that  b  obeys  a  binomial  distribution  with  parameters  F,psucc,  which  can  be  approximated  by 
a  normal  distribution  with  parameters  A f(Fpsucc,  Fpsucc(  1  —  pSucc))-  Then  F  ancilla  factories 
can  on  average  generate 

FTC omp  X  Psucc 


|+z)  states  in  time  Tcomp.  (We  can  replace  Tcomp  by  Tcomp  —  yj  K  /  2time(SW AP{m) )  for 
higher  accuracy.)  We  would  like  this  number  to  be  significantly  larger  than  N  to  ensure  that 
we  have  enough  +i)  states.  It  is  reasonable  to  choose  Fpsucc  to  be  5  deviations  larger  than 


£(Ta;lc ,  that  is 

-tcomp 


comp 


Thus  F  can  be  determined  by  choosing  appropriate  Tcomp. 

We  obtain  this  estimate  for  two  reasons:  first,  to  avoid  the  cost  in  error  correction  to 
preserve  the  |+i)  states;  and  second,  to  guarantee  that  we  have  sufficient  number  of  |+i) 
states  when  they  are  needed. 


9  Error  Correction  with  the  Surface  Code 

The  rest  of  this  document  describes  the  methodology  we  utilized  to  estimate  the  resources 
necessary  to  fault  tolerantly  implement  a  given  algorithm  using  the  surface  code  [36].  We 
also  refer  the  reader  to  [37]  which  illustrates  how  to  do  computation  with  the  surface  code. 
This  document  quantifies  the  cost  of  this  computation.  As  before,  we  assume  that  we  know 
the  number  of  logical  gates  required  to  implement  each  algorithm,  as  well  as  statistics  on 
gate  reliability  and  operation  time  associated  with  the  physical  quantum  technology  that  the 
algorithm  is  to  be  implemented  on.  Our  high  level  approach  is  as  follows.  First  we  estimate 
the  number  of  physical  qubits  required  to  perform  the  computation.  Then  we  estimate  the 
running  time  and  finally  the  number  of  gates  of  each  type  required  by  the  computation. 

To  estimate  the  number  of  physical  qubits,  we  first  establish  the  code  distance.  Given 
the  number  of  logical  gates  in  the  algorithm,  the  threshold  [38,  39]  of  the  code  and  error 
properties  of  the  physical  quantum  technology,  we  choose  a  sufficiently  high  code  distance 
to  guarantee  that  the  calculation  successfully  finishes  with  probability  at  least  50%.  We 
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describe  a  tiled  layout  that  ensures  that  the  weight  of  any  logical  operator  is  equal  at  least 
to  the  code  distance.  Since  the  optimal  layout  and  the  number  of  physical  qubits  depends  on 
the  syndrome  extraction  method  of  choice  (Steane,  Shor,  Knill),  we  describe  a  separate  layout 
for  each  of  these  methodologies.  In  addition  to  reserving  one  tile  per  logical  qubit  required 
by  the  algorithm,  we  also  estimate  the  additional  space  needed  for  ancilla  distillation  as  well 
as  ancillary  space  required  to  perform  certain  operations,  such  as  CNOT . 

The  running  time  is  obtained  by  calculating  the  time  needed  to  perform  each  elementary 
operation.  First,  we  estimate  the  time  needed  to  perform  one  round  of  syndrome  extraction 
and  error  correction  for  each  of  the  syndrome  extraction  schemes.  Then  we  estimate  the  time 
needed  to  perform  elementary  logical  gates  such  as  logical  state  preparation,  measurement 
and  the  CNOT  gate.  We  build  our  estimate  from  the  ground  up,  starting  with  estimates  for 
the  simpler  logical  gates  which  we  use  as  building  blocks  in  more  advanced  operations  such 
as  ancilla  distillation. 

A  gate  count  that  provides  the  number  of  gates  of  each  type  that  need  to  be  executed  is 
obtained  in  two  steps  as  follows.  In  the  first  step,  we  estimate  the  number  of  gates  needed  to 
perform  error  correction  during  the  entire  duration  of  the  experiment.  Since  error  correction 
is  performed  continuously,  and  all  other  operations  require  a  small  number  of  gates,  this  is  the 
dominant  factor  in  the  final  gate  count.  Note  that  the  gate  count  critically  depends  on  the 
syndrome  extraction  method  of  choice.  In  the  second  step,  we  add  the  small  number  of  gates 
required  to  perform  logical  operations.  These  additional  operations  in  the  second  step  do  not 
include  syndrome  measurement,  and  therefore  the  estimate  can  be  done  independently  of  the 
syndrome  extraction  method  of  choice.  Also  note  that  certain  operations  such  as  Pauli’s  do 
not  require  any  additional  physical  operations  as  long  as  we  keep  track  of  the  Pauli  frame. 

The  rest  of  the  document  is  organized  as  follows.  In  Section  10  we  discuss  the  inputs 
(properties  of  quantum  technologies  and  algorithms)  needed  to  perform  this  estimate.  Then 
in  Section  11  we  estimate  the  number  of  physical  qubits  required  by  the  system.  In  Section  12 
we  describe  the  methodology  used  to  estimate  the  running  time  on  the  given  algorithm  and 
physical  quantum  technology,  and  finally  in  Section  13  we  estimate  the  number  of  gates  of 
each  type. 

10  Surface  Code  Estimate  Inputs  and  Goals 

We  want  to  estimate  the  resources  required  to  implement  an  algorithm  on  a  given  physical 
quantum  technology.  For  convenience  we  will  assume  that  this  data  is  given  to  us  in  the  form 
of  two  data  structures  (Algorithm  and  Technology)  containing  numbers. 

The  structure  Algorithm  has  the  following  fields: 

1.  Logical  qubit  count:  numQubits 

2.  Gate  counts:  numPrep  1 0) ,  numPrep  |+),  numH,  nurnS,  nurnS' ,  numT,  numT' ,  numCNOT, 
etc. 

3.  Gate  parallelism:  paralPrep  |0),  paralPrep  |+),  paralH,  etc. 

The  structure  Technology  has  the  following  fields: 

1.  Physical  gate  error  rate:  errorRate 
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2.  Physical  gate  times:  Prep  |0)  Time,  Prep  |+)  Time,  HTirne,  STime,  S^Time,  TTirne, 
T'l'Time,  CNOTTime,  XTime,  etc. 

Our  goal  is  to  transform  these  inputs  into: 

1.  Number  of  qubits:  The  physical  area  used  to  implement  algorithm,  i.e. ,  the  number 
of  physical  qubits. 

2.  Runtime:  The  actual  time  to  run  algorithm  in  units  of  time. 

3.  Gate  count:  The  number  of  gates  of  each  type  needed  during  the  entire  computation. 

To  do  this,  we  will  use  the  following  steps: 

1.  Find  code  distance  d :  For  a  required  logical  gate  reliability  and  the  given  physical 
machine-level  reliabilities,  determine  the  required  size  and  spacing  of  the  holes  in  the 
surface  code  lattice  to  ensure  that  logical  operations  in  the  code  have  weight  at  least  d. 

2.  Determine  magic  state  distillation  precision:  From  the  physical  machine-level  pa¬ 
rameters,  determine  the  approximate  error  in  the  undistilled  state  and  the  concatenation 
level  required  to  distill  the  states  to  be  usable  in  the  logical  circuit. 

3.  Determine  number  of  qubits:  Determine  the  total  number  of  logical  qubits  required 
to  support  the  computation,  based  off  of  the  number  of  logical  qubits  in  the  algorithm 
and  the  number  of  logical  ancillary  qubits  to  support  CNOT,  H,  S,  and  T  operations 
(including  distillation).  Based  upon  a  layout  strategy  for  logical  qubits,  this  will  provide 
us  with  an  estimate  of  the  area  required  to  implement  the  computation.  The  layout 
strategy  used  will  depend  upon  the  constraints  of  the  physical  quantum  technology.  We 
will  assume  that  physical  qubit  layout  is  restricted  to  two  dimensions. 

4.  Determine  timesteps:  Determine  the  total  number  of  timesteps  required  to  imple¬ 
ment  the  algorithm,  based  upon  the  number  of  timesteps  required  for  each  of  the  logical 
operations  in  the  algorithm.  The  number  of  timesteps  depends  on  the  level  of  paral¬ 
lelization  that  can  be  assumed  between  logical  operations.  From  timesteps  we  can  attain 
the  runtime  of  the  algorithm. 

5.  Determine  number  of  gates  for  error  correction:  Count  the  number  of  gates 
needed  to  extract  all  syndromes  in  the  entire  system  and  to  correct  errors. 

6.  Determine  number  of  gates  for  all  other  operations:  Count  the  number  of  ad¬ 
ditional  gates  needed  to  perform  each  logical  operation,  multiplied  by  the  number  of 
occurrences  of  these  logical  operations  in  the  algorithm. 

7.  Determine  the  final  gate  count:  Add  the  number  of  gates  needed  by  error  correction 
and  other  operations. 
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11  Area  Estimate 

Here  we  estimate  the  number  of  physical  qubits,  i.  e.,  number  of  physical  qubits  needed  to 
encode  a  logical  qubit,  the  space  needed  to  store  ancillas  and  distill  magic  states.  To  do  this, 
we  estimate  a  sufficient  code  distance.  Then  we  specify  a  physical  qubit  layout  for  several 
possible  syndrome  extraction  techniques  and  provide  the  final  pysical  qubit  estimate. 


11.1  Background  and  Terminology 

Here  we  discuss  some  surface  code  terminology  necessary  to  explain  the  process  behind  our 
estimate. 

The  surface  code  [36]  places  qubits  on  a  lattice  such  as  the  one  shown  in  Figure  28  a). 
The  data  qubits  are  located  on  each  edge  of  the  grid. 

The  code  has  stabilizers  of  type  XXX X  and  ZZZZ.  The  XXXX  stabilizers  correspond 
to  the  orange  diamonds  in  Figure  28  a),  and  are  also  called  site  stabilizers.  The  ZZZZ 
stabilizers  correspond  to  the  blue  diamonds  and  are  called  plaquette  stabilizers.  Note  that 
stabilizers  of  weight  three  shown  as  triangles  may  be  present  at  the  edges  of  the  lattice  (we 
assume  absence  of  periodic  boundary  conditions),  or  on  the  perimeter  of  holes  in  the  surface. 

To  measure  each  of  the  stabilizers,  we  require  one  or  more  ancilla  qubits.  The  syndrome 
measurement  circuits,  required  number  of  ancillas,  and  optimal  qubit  layout  differs  for  each 
syndrome  extraction  method.  Detailed  analysis  for  the  Steane,  Shor,  and  Knill  syndrome 
extraction  method  is  shown  in  Section  11.4.1,  11.4.2,  and  11.4.3,  respectively.  Initially,  to 
facilitate  the  analysis  before  discussing  specifics  of  the  three  syndrome  extraction  methods, 
we  will  think  of  the  lattice  in  terms  of  the  grid  in  Figure  28.  Note  that  if  we  have  a  grid  that 
is  w  squares  wide  and  h  squares  tall,  we  will  have  a  surface  consisting  of  2 wh  +  w  +  h  physical 
qubits  (excluding  syndrome  measurement  ancillas). 

As  discussed  in  [37]  logical  qubits  are  represented  by  areas  in  the  surface  where  stabilizers 
are  not  enforced.  These  areas  are  called  holes.  Holes  always  come  in  pairs.  If  we  don’t 
enforce  a  rectangular  region  of  the  grid  cells,  we  create  a  "smooth”  hole.  Figure  28  b)  shows 
a  smooth  hole  on  the  grid.  If  we  don’t  enforce  a  region  of  stabilizers  that  is  shifted  by  half  a 
grid  cell  (in  both  spatial  dimensions),  we  create  a  "rough”  hole.  Since  logical  Pauli  operators 
in  the  code  are  loops  that  wrap  around  a  hole  or  connect  pairs  of  holes,  the  size  and  minimum 
spacing  of  these  holes  affects  the  fault  tolerance  of  the  system.  The  larger  the  hole,  the  better 
the  fault  tolerance. 

The  length  of  the  logical  operator  with  the  smallest  weight  is  called  code  distance.  In 
the  next  subsection,  we  describe  how  to  choose  code  distance  to  ensure  that  the  computation 
finishes  with  the  correct  result  with  high  probability. 

11.2  Layout  with  Sufficient  Code  Distance 

Let  each  logical  qubit  be  represented  by  a  pair  of  smooth  holes  in  the  surface.  These  holes  are 
positioned  on  the  grid  so  that  the  distance  between  two  holes  is  at  least  d  squares  (the  code 
distance)  to  maintain  fault-tolerance.  Figure  29  shows  this  layout.  Since  each  logical  qubit 
may  be  involved  in  a  braiding  operation  later  on  (where  another  hole  will  be  in  between  the 
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(a)  XXXX  site  stabilizers  are  shown  in  or-  (b)  A  hole  in  the  surface 

ange  and  ZZZZ  plaquette  stabilizers  in  blue 


Fig.  28:  The  lattice  of  the  surface  code. 


two  holes  representing  a  logical  qubit),  we  set  the  distance  of  the  two  holes  in  the  logical  qubit 
to  be  a  distance  of  3 d  away  from  each  other  by  default.  This  will  allow  us  to  place  a  hole  of 
width  d  in  the  area  between  the  two  holes  while  still  maintaining  a  distance  of  d  between  all 
three  holes.  This  thereby  facilitates  braiding  operations  between  logical  qubits.  We  also  lay 
out  each  pair  of  holes  next  to  each  other,  maintaining  a  distance  of  3 d  between  the  edges  of 
every  hole  pair,  allowing  enough  room  for  hole  movement  to  occur  between  different  logical 
qubits. 


The  required  code  distance  d  can  be  estimated  by  solving  the  following  equation  (as  shown 
in  [40]) 


^  n  I  r<  ^Physical  \ 
€ Logical  ^  ^1  I  ^2  ) 

\  £ Threshold  J 


L^J 


(65) 


This  equation  involves  the  following  variables: 

•  Logical  error  rate  e Logical'  The  required  logical  error  rate  to  guarantee  high  proba¬ 
bility  of  success  can  be  estimated  as  eLogicai  ~  0.5/ Algorithm.numGates. 


•  Physical  error  rate  ephysicai ■  The  physical  error  rate  depends  on  the  parameters  of 
the  physical  technology  (here,  ephysicai  =  Technology. errorRate). 


•  Code  threshold  ephreshoid •  Several  estimates  of  the  threshold  of  the  surface  code  have 
been  presented  in  literature.  Here,  we  use  the  numerical  estimate  ephreshoid  ~  0.01 
from  [37]. 


•  Code  constants  C\  and  C2:  The  constants  depend  on  the  properties  of  the  code.  As 
has  been  done  in  [40],  we  use  the  estimate  C\  ~  0.13  and  C2  ~  0.61. 
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d  3d 


Fig.  29:  Spacing  between  holes  representing  logical  qubits. 


11.3  Ancilla  Space 

To  store  certain  number  of  logical  qubits,  we  need  the  same  number  of  hole  pairs.  However, 
to  actually  perform  computation,  we  need  space  to  store  ancillas.  Moreover,  our  approach  of 
computation  with  the  surface  code  requires  preparation  of  magic  states  [41],  which  requires 
additional  space.  In  our  layout  scheme,  there  are  two  contributors  to  the  ancilla  area: 

1.  CNOT  ancilla  area:  Additional  space  is  needed  to  store  newly  initialized  ancilla 
qubits  to  perform  CNOT  operations  between  any  two  hole  pairs. 

2.  Operator  ancilla  area:  Additional  space  is  required  to  distill  and  store  magic  states. 
Specifically,  magic  states  are  needed  to  perform  the  S,  S'1',  T,  and  T1  operations. 

We  will  now  determine  how  many  extra  hole  pairs  are  required  for  computation. 

11.3.1  Logical  CNOT  Space 

We  will  assume  that  all  data  is  represented  by  a  pair  of  smooth  holes.  To  implement  an 
arbitrary  CNOT,  we  require  space  for  2  additional  hole  pairs  in  our  layout.  This  additional 
space  is  initially  empty  and  is  initialized  prior  to  performing  the  CNOT  operation(s).  Figure 
30  shows  an  illustration  of  the  qubit  layout.  Dashed  lines  encircle  pairs  of  holes  representing 
one  logical  qubit  in  the  memory. 

CNOT  operations  can  be  done  easily  between  smooth  and  rough  hole  pairs  by  using  a 
braiding  procedure  in  which  one  hole  in  a  hole  pair  is  grown  and  shrunk  to  ’’move”  around 
another  hole  in  another  hole  pair.  Braiding  is  illustrated  in  Figure  31  a)  and  b).  Performing 
a  smooth-smooth  CNOT  is  more  complicated,  and  we  will  need  our  ancilla  space  for  smooth- 
rough  qubit  conversion. 
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Fig.  30.  Logical  qubit  layout  with  two  empty  spots  for  smooth-smooth  CNOT  ancilla. 
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b)  Smooth-rough  CNOT  done  by  braiding 


Fig.  31.  Smooth-rough  CNOT. 
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The  schematic  circuit  for  implementing  a  smooth-smooth  CNOT  is  shown  in  Figure  32 
a).  One  of  the  ancillas  is  created  by  puncturing  a  pair  of  smooth  holes  and  initializing  the 
qubit  in  the  |0)  state.  The  other  ancilla  is  initialized  as  a  pair  of  rough  holes  in  the  |+) 
state.  This  is  followed  by  three  smooth-rough  CNOTs  between  the  data  qubits  and  ancillas, 
implemented  by  braiding,  and  finally  two  measurements.  By  this  measurement,  the  control 
and  one  of  the  ancilla  qubits  are  destroyed  and  the  location  of  the  control  qubit  moves  to 
one  of  the  ancilla  qubits.  Figure  32  b)  shows  how  the  position  of  the  qubits  changes  in  the 
layout.  This  requires  us  to  keep  track  of  the  location  of  the  logical  qubits  throughout  the 
computation.  At  the  end,  we  are  still  left  with  two  empty  spots,  which  allows  us  to  perform 
the  next  CNOT  operation  between  any  two  hole  pairs.  Note  that  if  we  add  multiple  target 
qubits  and  replace  the  middle  CNOT  in  the  circuit  with  a  multi-target  smooth-rough  CNOT, 
we  can  implement  a  smooth-smooth  multi-target  CNOT.  Such  circuits  are  particularly  useful 
for  operations  such  as  magic  state  distillation,  as  will  be  described  later. 

It  is  important  to  note  that  our  current  resource  estimation  methodology  assumes  that 
multiple  logical  CNOT  gates  can  be  scheduled  in  parallel,  requiring  Qubits. CNOT  Ancilla  = 
2Algorithm.paralC NOT  ancilla  locations.  If  these  CNOT  gates  are  scheduled  in  parallel, 
we  may  not  be  able  to  guarantee  that  we  can  find  non-overlapping  braids,  and  some  of  the 
CNOTs  must  be  scheduled  in  sequence.  However,  our  assumption  is  justified  because  all 
the  analyzed  algorithms  have  very  small  parallelization  factors  of  the  CNOT  operations  at 
the  logical  level,  and  it  is  likely  that  one  can  find  a  few  non-overlapping  braids  between 
pairs  of  qubits  dispersed  in  a  large  system.  The  feasibility  and  optimal  strategy  for  parallel 
scheduling  of  a  larger  number  of  CNOT  operations  in  the  surface  code  is  the  subject  of  our 
ongoing  investigation. 

11.3.2  Magic  State  Distillation  Space 

Here,  we  will  determine  how  many  ancilla  qubits  are  needed  to  distill  a  sufficient  number  of 
magic  states.  The  operations  that  require  ancilla  qubits  are  the  S,  S'1',  T,  and  T1  gates.  We 
need  two  types  of  ancillas  initialized  in  states  |Y)  and  |A)  [37,41]: 

in  =  ^(|0>+*|1»,  (66) 

in  =  ^(|0>+n/4|l>).  (67) 

Since  we  can  only  inject  low-precision  ancilla  states  into  the  system,  we  have  to  distill  them 
using  distillation  circuits  that  take  multiple  low-precision  ancilla  states  and  output  a  single 
ancilla  state  of  a  higher  precision.  The  S  and  S1  gates  require  one  |Y)  ancilla  and  do  not 
destroy  the  ancilla.  The  T  and  T'  gates  require  one  A)  ancilla  (which  is  destroyed  in  the 
process  of  applying  the  gate)  and  potentially  a  |F)  ancilla  that  is  not  destroyed  as  it  is  used 
in  the  application  of  an  S  gate.  Hence,  it  is  necessary  to  have  enough  area  to  prepare  the 
the  |Y)  state  just  a  few  times  (more  precisely,  max(Algorithm.paralS,  Algorithm.paralT) 
times  to  allow  S  and  T  gate  parallelism).  The  |Y)  state  preparations  are  assumed  to  be  at 
the  beginning  of  the  computation.  We  assume  that  the  A)  ancillas  are  prepared  offline  to 
reduce  the  amount  of  time  it  takes  to  apply  the  T  and  T'  gates.  Our  methodology  makes  the 
simplifying  assumption  that  simultaneous  offline  preparation  of  Algorithm.paralT  ancillas  of 
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Fig.  32.  Smooth-smooth  CNOT. 
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|  A)  type  is  sufficient,  but  we  note  that  in  the  real  system  more  ancillas  may  be  need  if  the  T 
gates  are  applied  frequently. 

To  compute  the  resources  to  prepare  the  |Y)  and  |A)  states,  we  need  to  analyze  the 
distillation  circuits  for  these  states.  Details  on  each  of  the  distillation  circuits  are  given 
below: 

•  Distillation  of  |Y)  state:  The  distillation  circuit  for  the  |Y)  takes  as  input  7  copies 
of  the  state  |Y)  with  error  probability  p  and  produces  a  state  with  an  error  probability 
7 p3. 

•  Distillation  of  \A)  state:  The  distillation  circuit  for  the  |A)  takes  as  input  15  copies 
of  the  state  |A)  with  error  probability  p  and  produces  a  state  with  an  error  probability 
35  p3. 

Concatenating  these  circuits  allows  us  to  achieve  lower  error  probabilities.  To  achieve  a  desired 
error  probability  r,  we  can  compute  the  number  of  distillation  levels  Li  required  to  distill  the 
|Y)  state  by  solving  the  equations: 


YError(Li)  <  r, 
YError(i!)  =  7YError(L1  —  l)3, 


where  YError(O)  is  the  physical  error  rate.  This  yields: 


L  i 


l°9 


log(r)+log(y/ 7) 
g(Y  Error(0))+3log(y/7 ) 


log( 3) 


(68) 

(69) 


(70) 


To  solve  for  the  number  of  distillation  levels  L 2  required  to  distill  the  state  \A)  we  solve 
the  recurrence: 


AError(L2)  <  r, 
AError(L2)  =  35AError(L2  —  l)3, 


where  AError(O)  is  the  physical  error  rate.  This  yields: 


L2 


log  ( 


log(r)+log(\/ 35) 
log{AError(0))-\-3log{\/35) 


log (3) 


(71) 

(72) 


(73) 


Once  we  have  computed  L\  and  L2,  we  can  compute  the  number  of  extra  hole  pairs 
required  for  the  distillation.  Let  us  first  consider  an  abstract  distillation  circuit  where  n 
qubits  are  distilled  into  a  single  qubit.  Note  that  the  recursive  distillation  process  can  be 
understood  in  terms  of  a  tree  structure,  where  the  final  result  is  the  process  of  operations 
involving  several  layers  of  source  qubits.  This  is  illustrated  by  example  in  Figure  33,  where 
n  =  2  and  the  qubits  involved  in  L  =  3  levels  of  distillation  are  shown.  We  can  estimate 
the  number  of  qubits  required  to  facilitate  the  distillation  as  nL ,  the  number  of  leafs  in  the 
tree.  Then  the  number  of  qubits  required  to  distill  a  single  ancilla  |Y)  and  |A)  is  7Ll  and 
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Fig.  33.  Tree  diagram  illustrating  state  distillation. 
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15ia  respectively.  Finally,  let  Qubits.  |Y)  Ancilla  and  Qubits.  |A)  Ancilla  denote  the  number 
of  qubits  in  the  entire  ancilla  factory  for  ancillas  of  type  |Y)  and  |A),  respectively.  Then  we 
have: 

Qubits.  |Y)  Ancilla  =  7Llmax(Algorithm.paralS,  Algorithm.paralT),  (74) 

Qubits.  |  A)  Ancilla  =  151,2  Algorithm.paralT.  (75) 

11.4  Qubit  Layout 

Here  we  consider  the  layout  of  the  physical  qubits.  It  is  important  to  note  that  our  analysis 
is  simplified  -  we  do  not  consider  some  connectivity  and  qubit  layout  constraints  that  are 
specific  to  each  physical  technology.  We  assume  that  we  are  able  to  place  qubits  in  a  two 
dimensional  plane  on  a  regular  grid,  and  that  nearest  neighbors  are  always  able  to  interact. 
This  assumption  greatly  simplifies  our  analysis  because  we  do  no  have  to  design  a  separate 
qubit  layout  for  each  technology.  Performing  a  finer-grained  analysis  for  a  specific  technology 
should  be  a  straightforward  extension  of  this  work. 

We  use  a  square  layout,  where  we  pack  x  hole  pair  locations  into  a  grid  of  \^/x]  x  [~ s/x\ 
hole  pair  locations  with  adequate  spacing  near  the  edges  of  the  grid  to  allow  braiding  opera¬ 
tions.  Let  the  width  of  the  grid  be  denoted  as  w  =  \\/x\.  The  layout  is  shown  in  Figure  34 
a).  An  alternative  approach  would  be  a  linear  layout  shown  in  Figuer  34  b),  which  may  be 
required  by  some  physical  realizations.  We  do  not  consider  the  linear  layout  in  this  work  as 
we  believe  it  is  too  restrictive  for  any  viable  quantum  architecture. 

Each  of  the  blocks  shown  in  Figure  34  a)  consists  of  d  x  d  physical  data  qubits  to  guarantee 
sufficient  code  distance.  Each  block  must  also  contain  sufficient  number  of  additional  qubits 
that  serve  as  ancillas  during  syndrome  measurement.  Considering  only  the  data  qubits,  the 
width  and  height  of  the  layout  are: 

Layout. width(x)  =  d(3\y/x~\  +  (Tv^l  —  1)  +  4),  (76) 

Layout,  height  (x)  =  d{l\\/x\  +  (f\/il  —  1)  +  4)).  (77) 

(78) 

This  means  that  the  number  of  unit  squares  in  the  surface  code  layout  is  Layout.width  x 
Lay  out. height.  The  unit  squares  are  the  basic  building  blocks  of  the  code  consisting  of  four 
data  qubits,  one  on  each  of  the  four  sides  of  the  square.  The  only  remaining  task  is  to  estimate 

the  number  of  physical  qubits  that  reside  in  the  unit  square  in  the  surface  code,  including 

the  ancillas  used  for  syndrome  measurement.  This  is  done  separately  for  the  three  syndrome 
extraction  methods.  We  note  that  some  care  must  be  taken  to  avoid  qubit  double  counting 
as  two  squares  are  incident  on  each  data  qubit. 

11.4-1  Steane  Syndrome  Extraction 

The  circuit  for  Steane  syndrome  extraction  is  illustrated  in  Figure  35.  The  circuit  on  the  left 
measures  the  site  XX XX  stabilizer  and  the  circuit  on  the  right  measures  the  plaquette  ZZZZ 
stabilizer.  The  four  measured  data  qubits  are  denoted  d\  to  d^.  and  the  ancilla  required  for 
the  syndrome  measurement  is  denoted  a\.  We  see  that  a  single  ancilla  suffices  for  each  site 
and  plaquette  stabilizer  measurement.  Therefore,  our  proposed  qubit  layout  places  one  qubit 
in  the  center  of  each  site  and  plaquette.  This  is  shown  in  Figure  36. 
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In  our  layout,  the  number  of  physical  qubits  including  ancillas  in  each  square  is  therefore: 

Layout.  QubitsPerSquareSteane  =  4.  (79) 


di 

d2 

d3 

d4 

ai 


Fig.  35.  Steane  syndrome  extraction  circuit. 
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Fig.  36.  Qubit  and  ancilla  layout  for  Steane  syndrome  extraction. 


11.4-2  Shor  Syndrome  Extraction 

The  circuit  for  Shor  syndrome  extraction  is  illustrated  in  Figure  37.  The  circuit  on  the 
left  measures  the  site  XXXX  stabilizer  and  the  circuit  on  the  right  measures  the  plaquette 
ZZZZ  stabilizer.  The  four  measured  qubits  are  denoted  di  to  c?4,  Unlike  for  the  Steane 
method,  a  single  syndrome  measurements  requires  five  ancilla  qubits.  Four  qubits  denoted 
di  to  04  are  used  to  prepare  an  entangled  cat  state  and  the  fifth  ancillary  qubit  05  is  used  to 
verify  the  state.  Our  proposed  qubit  layout  places  five  qubits  around  the  center  of  each  site 
and  plaquette.  This  is  shown  in  Figure  38. 

In  our  layout,  the  number  of  physical  qubits  including  ancillas  in  each  square  is  therefore: 

Layout.  QubitsPerSquareShor  =  12.  (80) 


11.4-3  Knill  Syndrome  Extraction 

The  circuit  for  Knill  syndrome  extraction  is  illustrated  in  Figure  39.  One  type  of  circuit 
is  needed  to  measure  each  data  qubit  in  the  lattcie.  The  original  data  qubit  denoted  d\  is 
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Fig.  37.  Shor  syndrome  extraction  circuit. 


Data  qubit 


Ancillas 


Fig.  38.  Qubit  and  ancilla  layout  for  Shor  syndrome  extraction. 


teleported  after  measurement  and  replaces  the  ancillary  qubit  02-  The  two  other  qubits  are 
measured  in  the  process.  Our  proposed  qubit  layout  places  two  ancillary  qubits  right  next  to 
each  data  qubit.  This  is  shown  in  Figure  40.  A  black  qubit  is  a  data  qubit,  and  the  red  and 
blue  qubit  directly  to  the  right  are  its  ancillas.  After  measurement,  the  state  of  the  qubit  is 
teleported  to  the  red  qubit.  In  the  next  measurement  round,  the  two  qubits  that  were  just 
measured  (e?i  and  ai)  become  the  ancillas  and  the  state  is  teleported  back  to  the  original 
location  (into  qubit  di).  Note  that  a  further  optimization  of  this  layout  is  possible.  For 
quantum  technologies  with  time  consuming  state  initialization,  additional  qubits  can  be  used 
to  eliminate  the  need  to  wait  for  ancilla  initialization.  However,  this  optimization  is  beyond 
the  scope  of  this  report. 

In  our  layout,  the  number  of  physical  qubits  including  ancillas  in  each  square  is  therefore: 


Layout .  QubitsPerSquareKnill 


6. 


(81) 
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Fig.  39.  Knill  syndrome  extraction  circuit. 
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Fig.  40.  Qubit  and  ancilla  layout  for  Knill  syndrome  extraction. 


11.5  Total  Area 

We  can  now  arrive  at  the  total  area  required  to  perform  our  computation.  To  recap,  we  will 
need  space  for: 

•  Logical  qubits:  the  algorithm  uses  Algorithm. numQubits  logical  qubits. 

•  CNOT  area:  we  need  Qubits. CNOTAncilla  ancilla  spaces  for  parallel  smooth-smooth 
CNOT  operations. 

•  Persistent  |Y)  storage:  One  space  each  to  store  the  max(Algorithm.paralS,  Algorithm.paralT) 
ancillas  of  type  |Y)  that  are  used  (and  not  destroyed)  by  the  S  and  operators. 

•  Ancilla  space  for  |A)  distillation:  offline  |  A)  ancilla  distillation  requires  Qubits.  |A)  Ancilla 
spaces. 

•  Space  for  the  initial  |Y)  distillation  (can  overlap  with  logical  qubits  and  |A) 
distillation  space):  Initially,  Qubits.  |Y)  Ancilla  spaces  are  required  to  build  persistent 
| Y)  state  used  by  the  S  and  S'*  operators.  However,  this  space  can  overlap  with  the 
space  for  the  logical  qubits  and  the  ancilla  space  for  the  |A)  state,  as  the  space  will  be 
used  only  once  at  the  beginning  of  the  computation.  After  this,  the  space  can  be  used 
for  storing  other  qubits. 
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Fig.  41.  Surface  code  logical  operation  overview. 


Putting  these  together,  we  get  the  following  expression  for  the  number  of  spaces  required 
to  implement  the  algorithm: 

Hole  Pairs  =  Qubits. CNOTAncilla  +  max(Algorithm.paralS,  Algorithm. paralT)  +  (82) 
max  {Qubits.  |A)  Ancilla  +  Algorithm. numQubits,  Qubits.  |Y)  Ancilla}  . 


Finally,  we  can  get  the  total  number  of  qubits  for  the  Steane  extraction  technique  as: 

Physical  Qubits  =  Layout. QubitsPerSquareSteane  x  Layout. width(Hole  Pairs)  x  (83) 
Layout. height(Hole  Pairs). 


The  physical  qubit  count  for  Shor  and  Knill  techniques  is  obtained  analogously. 


12  Running  Time  Estimate 

Surface  code  computation  requires  us  to  perform  logical  operations  on  our  surface  of  qubits 
and  also  regular  error-correction  cycles.  To  compute  the  time  estimate  for  the  algorithm,  we 
first  determine  the  execution  times  for  the  key  gates  in  our  circuit. 


We  build  our  estimates  of  the  time  required  to  implement  logical  operations  from  the 
ground  up.  First,  we  arrive  at  estimates  of  simple  logical  operations.  Then,  more  complex 
logical  operations  are  expressed  in  terms  of  the  simpler  operations.  In  the  surface  code,  opera¬ 
tions  usually  consist  of  some  set  of  physical  operations  followed  by  d,  rounds  of  error  correction 
to  maintain  fault  tolerance,  as  shown  in  Figure  41.  Our  models  quantify  the  cost  of  both  the 
logical  operations  and  the  error  correction. 


In  the  interest  of  readability,  we  will  assume  that  our  variables  refer  to  times  in  this  section 
and  we  will  abbreviate  the  quantum  technology  parameters  as  follows: 
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Prep  0) 

=  Technology. Prep  |0)  Time 

(84) 

Prep  +) 

=  Technology. Prep  |+)  Time 

(85) 

H 

=  Technology.  HTirne 

(86) 

S 

=  Technology.  STime 

(87) 

=  Technology.  Time 

(88) 

T 

=  Technology.  TTirne 

(89) 

xt 

=  Technology.  T^  Time 

(90) 

CNOT 

=  Technology. CNOTTime 

(91) 

Rz 

=  Technology.  Rz  Time 

(92) 

The  following  list  discusses  the  estimates  for  a  number  of  logical  operations.  Since  we  use 
smooth  hole  pairs  to  encode  our  logical  qubits,  we  will  successively  build  up  to  estimates  for 
key  ’’double  smooth”  logical  operations.  After  this,  we  can  write  an  equation  for  the  runtime 
of  the  algorithm. 

1.  No-op  (stabilizer  measurement  cycle):  All  the  stabilizers  (site  and  plaquette)  are 
measured.  By  measuring  in  the  right  order,  these  can  be  parallelized.  The  time  is  given 
by  the  time  of  the  stabilizer  measurement  circuit  with  the  maximum  time.  Since  the 
syndrome  measurement  circuit  is  different  for  each  of  the  three  syndrome  extraction 
methods  (Steane,  Shor  and  Knill),  we  have  three  different  expressions  for  the  time  to 
perform  the  measurements. 

Steane. EC  =  maa;(Prep  |0)  +  MeasX,  Prep  |+)  +  MeasZ)  +  4CNOT  (93) 

Shor. EC  =  maa;(Prep  |0),  Prep  |+))  +  4CNOT  +  H  +  ma:r( MeasX,  MeasZ)(94) 
Knill. EC  =  rnax(Prep  |0),  Prep  |+))  +  2CNOT  +  maa:(MeasX,  MeasZ)  (95) 

(96) 

In  the  interest  of  readability,  we  will  drop  the  name  of  the  syndrome  measurement 
technique  and  simply  refer  to  the  time  needed  to  perform  syndrome  measurement  as 
EC. 

2.  Error-correction  cycles  for  fault  tolerance:  The  surface  code  requires  us  to  main¬ 
tain  d  spacing  not  just  in  the  spatial  domain,  but  also  in  the  temporal  domain.  To  do 
this,  we  can  apply  a  primitive  that  incorporates  d  stabilizer  measurement  cycles  called 
TEC.  We  will  add  this  to  several  of  our  fault-tolerant  operations. 

TEC  =  EC  x  d  (97) 

3.  Prepare  single  logical  hole:  While  we  assume  all  logical  data  is  to  be  stored  in  smooth 
qubits,  it  is  still  necessary  to  prepare  rough  qubits  to  implement  CNOT  operations.  We 
assume  that  state  preparations  are  followed  by  the  error  correction  block. 
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•  Prepare  smooth  |0l)  and  |+l):  The  former  requires  a  set  of  parallel  X  mea¬ 
surements  in  the  hole  and  a  round  of  error  correction  with  new  stabilizers  for  sides 
of  the  hole.  For  the  latter,  the  procedure  is  similar  but  instead  of  X  measurements 
we  must  prepare  the  physical  |+)  state  in  the  block.  Since  error  correction  is  in¬ 
corporated,  this  part  is  bounded  by  amount  of  error  correction  required  for  fault 
tolerance. 

PrepSmooth  |0l)  =  MeasX  +  TEC  (98) 

PrepSmooth  |+L)  =  Prep  |+)  +  TEC  (99) 

•  Prepare  rough  0 r, )  and  |+l):  The  procedures  for  these  are  similar  to  those  for 
the  smooth  qubits. 


PrepRough  0L)  = 

Prep  |0)  +  TEC 

(100) 

PrepRough  +l)  = 

MeasZ  +  TEC 

(101) 

Measure  logical  hole  in 

X  or  Z:  These  operations  each  require  a  chain  of  measure- 

ments  in  either  the  X  or 

Z  bases  followed  by 

an  ECC  round.  This  will  destroy  the 

holes.  Each  of  the  measurements  in  the  chain  can  be  done  in  parallel. 

•  Smooth  holes: 

MeasXSmooth  = 

MeasX  +  TEC 

(102) 

MeasZSmooth  = 

MeasZ  +  TEC 

(103) 

•  Rough  holes: 

MeasXRough  = 

MeasZ  +  TEC 

(104) 

MeasZRough  = 

MeasX  +  TEC 

(105) 

5.  Grow  hole:  Growing  a  hole  requires  measurements  and  corrections. 

•  Smooth:  To  grow  a  smooth  hole,  X  measurements  are  applied  in  the  region  adja¬ 
cent  to  the  hole  and  then  Z  flips  are  applied  based  upon  the  measurements.  This 
also  requires  measuring  three  term  stabilizers  on  the  sides  on  the  hole,  however, 
this  is  taken  into  account  by  an  incorporated  ECC  step. 

GrowSmooth  =  MeasX  +  Z  +  TEC  (106) 

•  Rough:  Rough  holes  are  done  similar ily,  but  with  Z  measurements  and  X  flips. 

GrowRough  =  MeasZ  +  X  +  TEC  (107) 

6.  Shrink  hole:  Shrinking  or  splitting  a  hole  requires  us  to  reenforce  the  stabilizers  in  the 
region  that  are  no  longer  part  of  the  hole.  For  smooth  holes,  measuring  the  Z  stabilizers 
and  possibly  correcting  them  by  bit  flips  on  the  qubits,  followed  by  error  correction 
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completes  the  shrinking  operation.  Rough  holes  are  shrunk  analogously,  finishes  the 
movement. 


ShrinkSmooth  =  MeasZ  +  X  +  TEC  (108) 

ShrinkRough  =  MeasX  +  Z  +  TEC  (109) 

7.  Double  holes:  It  is  useful  to  use  double  holes  to  encode  information.  In  general,  the 
costs  of  building  double  holes  are  equivalent  to  building  single  holes,  as  the  operations 
involved  in  this  process  can  be  done  in  parallel.  The  costs  for  several  double  hole 
operations  are  shown  below. 


PrepDoubleSmooth  0l) 

=  PrepSmooth  0l) 

(110) 

PrepDoubleSmooth  +l) 

=  PrepSmooth  +l) 

(111) 

MeasXDoubleSmooth 

=  MeasXSmooth 

(112) 

MeasZDoubleSmooth 

=  MeasZSmooth 

(113) 

PrepDoubleRough  0l) 

=  PrepRough  0l) 

(114) 

PrepDoubleRough  +L) 

=  PrepRough  +L) 

(115) 

MeasZDoubleRough 

=  MeasZRough 

(116) 

MeasXDoubleRough 

=  MeasXRough 

(117) 

8.  Low-level  state  injection  (single  smooth  hole):  This  procedure  (as  described 
in  [41])  allows  us  to  inject  a  small  smooth  hole  with  an  arbitrary  state  \if>)  into  the 
surface.  It  requires  one  Z  measurement,  the  application  of  four  X  operators  (which  can 
be  done  in  parallel),  the  application  of  the  operators  that  applies  \ip),  an  X  stabilizer 
measurement,  then  a  Z  operator,  and  then  four  Z  operators  (which  can  be  done  in 
parallel).  An  ECC  step  is  added  at  the  end  of  this  procedure. 

InjectSmooth  =  MeasZ  +  X  +  Apply  | i/t)  +  EC  +  Z  +  Z  +  TEC  (118) 

As  described  earlier,  two  key  physical  states  that  we  need  to  perform  the  S  and  T  gates 
are  the  |Y)  =  ^  (|0)  +  i  |1))  and  |A)  =  ^  (|0)  +  el7r/4  |1))  states.  These  are  assumed 

to  be  done  using  a  arbitrary  Z  rotation  mechanism  on  the  physical.  Hence,  the  costs 

for  injecting  these  gates  are  as  follows. 

InjectSmooth  |AL)  =  MeasZ  +  X  +  Rz  +  EC  +  2Z  +  TEC  (119) 

InjectSmooth  |YL)  =  MeasZ  +  X  +  Rz  +  EC  +  2Z  +  TEC  (120) 

9.  Preparing  double  smooth  \YL)  and  | Aj)\  The  cost  to  inject  these  states  is  given  by 
the  cost  to  prepare  the  states  and  to  grow  and  split  them.  The  states  are  initial  single 
holes,  grown  as  single  holes,  and  split  as  single  holes,  building  a  double  hole.  These  are 
shown  below. 

PrepDoubleSmooth  |Al)  =  InjectSmooth  |Al)  +  GrowSmooth  +  ShrinkSmooth  (121) 
PrepDoubleSmooth  |Yl)  =  InjectSmooth  |Yl)  +  GrowSmooth  +  ShrinkSmooth  (122) 
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10.  Double  smooth-rough  CNOT:  A  smooth-rough  CNOT  between  two  hole  pairs  is 
done  by  braiding.  The  braiding  operation  requires  two  movements  and  is  done  using 
one  of  the  rough  hole  pieces.  Hence,  the  operation  requires  a  grow,  a  shrink,  another 
grow,  and  another  shrink. 

DoubleSnroothRoughCNOT  =  GrowRough  +  ShrinkRough  +  (123) 

GrowRough  +  ShrinkRough 

11.  Double  smooth-rough  multi-target  CNOT:  In  our  layout,  a  multi-target  smooth 
rough  CNOT  operation  involving  n  targets  can  be  done  with  n  moves  to  each  target 
pair,  followed  by  one  movement  back.  In  theory,  this  can  be  done  with  fewer  moves,  but 
this  requires  the  hole  pairs  to  be  laid  out  in  manner  so  segment  of  the  braiding  path 
intersects  with  itself. 

DoubleSmoothRoughMTCNOT(n)  =  (n  +  1)  (GrowRough  +  ShrinkRoughj(124) 

12.  Double  smooth-smooth  CNOT:  A  smooth-smooth  CNOT  between  hole  pairs  is 
done  using  the  construction  from  [37],  which  requires  a  rough  ancilla  state  and  a  smooth 
ancilla  state.  The  rough  ancilla  pair  is  prepared  to  a  |+l),  the  smooth  pair  is  prepared 
to  a  |0l).  We  assume  that  the  ancilla  preparation  is  done  offline  and  doesn’t  affect  the 
running  time.  Then,  3  smooth-rough  CNOTs  are  done,  and  then  a  smooth  and  rough 
measurements  are  done. 

DoubleSmoothSmoothCNOT  =  3DoubleSmoothRoughCNOT  +  (125) 

maa:(MeasZDoubleSmooth, 

MeasXDoubleRough) 

13.  Double  smooth-smooth  multi-target  CNOT:  The  difference  between  this  oper¬ 
ation  and  the  single-target  double  smoooth-smooth  CNOT  is  that  the  smooth-rough 
CNOT  in  the  middle  of  the  circuit  is  replaced  by  a  multi-target  smooth-rough  CNOT 
operation.  Hence,  the  cost  of  this  operation  is  given  as  follows. 

DoubleSmoothSmoothMTCNOT(n)  =  2DoubleSmoothRoughCNOT  +  (126) 

DoubleSmoothRoughMTCNOT(n)  + 
maa;  (MeasZDoubleSmootli, 
MeasXDoubleRough) 

14.  Prepare  logical  double  smooth  |  Yj,)  and  |  Al)  to  a  level:  Concatenated  distillation 
of  the  states  |  Yj,}  and  |  Al)  stored  in  smooth  holes  is  necessary  to  apply  the  S  and  T 
gates.  Here  we  show  the  costs  of  these  circuits  implemented  using  regular  smooth- 
smooth  CNOT  operations  and  also  multi-target  CNOTs. 

•  Double  smooth  | Yl):  The  distillation  circuit  for  the  \Yi)  state  resembles  a 
Steane-code  logical  encoder  with  operations  ordered  backwards,  as  shown  in  Figure 
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b)  A  state  distillation  circuit 


Fig.  42.  Distillation  circuits. 
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42  a).  Since  the  probability  of  needing  to  stop  distillation  is  asymptotically  zero, 
we  will  assume  that  the  distillation  procedure  finishes  every  time.  The  cost  of  a 
concatenated  distillation  circuit  of  a  level  L  is  given  by  the  following  equations. 


DistillDoubleSmooth(|YL) ,  L) 


DistillDoubleSmooth(|YL)  ,0) 


DistillDoubleSmooth(|YL) ,  L  —  1)  +  (127) 
3DoubleSmoothSmoothMTCNOT(3)  + 
DoubleSmoothSmoothMTCNOT(2)  + 
max  (MeasZDoubleSmooth, 
MeasXDoubleSmooth) 

PrepDoubleSmooth  |YL)  +  (128) 

3DoubleSmoothSmoothMTCNOT(3)  + 
DoubleSmoothSmoothMTCNOT(2)  + 
max  (MeasZDoubleSmooth, 
MeasXDoubleSmooth) . 


•  Logical  | A):  The  distillation  circuit  for  the  |A)  state  is  shown  in  Figure  42  b). 
The  cost  of  a  concatenated  distillation  circuit  is  given  as  follows. 


DistillDoubleSmooth(|  Al)  ,  L) 


DistillDoubleSmooth  |  Al)  ,  0) 


DistillDoubleSmooth(|AL) ,  L  —  1)  +  (129) 
4DoubleSmoothSmoothMTCNOT(7)  + 
DoubleSmoothSmoothMTCNOT(6)  + 
max  (MeasZDoubleSmooth, 
MeasXDoubleSmooth) 

PrepDoubleSmooth  |Al)  +  (130) 

4DoubleSmoothSmoothMTCNOT(7)  + 
DoubleSmoothSmoothMTCNOT(6)  + 
max  (MeasZDoubleSmooth, 
MeasXDoubleSmooth) 


15.  Double  smooth  S  and  S':  The  S  gate  on  a  smooth  hole  is  done  using  the  circuit  shown 
in  Figure  43  b),  which  is  from  [37].  This  requires  a  double  smooth  ancilla  state  |Yl) 
(which  we  assumed  was  prepared  at  the  start  of  the  computation)  and  the  application 
two  Hadamards  and  two  CNOTs.  The  inverse  S'!  is  obtained  by  running  the  circuit 
backwards. 


DoubleSmoothS 

=  2DoubleSmoothSmoothCNOT  + 

2DoubleSmoothH 

(131) 

DoubleSmoothS^ 

=  DoubleSmoothS 

(132) 

16.  Double  smooth  T  and  T^:  The  T  gate  on  a  double  smooth  hole  can  be  done  using 
a  measurement-based  circuit.  This  circuit  is  shown  in  Figure  43  a).  To  implement  the 
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^(|0)  +  ^|l» 


XMR{{-l)M0)\ij>) 


W 


Meas 


M 


a)  Rotation  gate  using  ancilla  state 


b)  S  gate  using  ancilla  state 


Fig.  43.  Measurement-based  rotation  gate  and  S  gate. 


T  gate,  the  input  ancilla  state  (|0)  +  eie  |1))  is  initialized  to  be  the  |A)  state.  We 

assume  that  the  ancilla  is  prepared  offline,  and  it  doesn’t  affect  the  running  time.  If  the 

measurement  (in  the  Z  basis)  indicates  that  the  T  gate  has  been  applied,  the  procedure 
is  done.  If  not,  the  operation  actually  has  been  applied  and  an  S  gate  needs  to 
be  applied  to  correct  the  state.  Both  events  can  occur  with  equal  probability.  The 
cost  of  these  operations  is  given  below,  the  factor  1/2  represents  the  probability  of  the 
application  of  the  S  gate. 

DoubleSmoothT  =  DoubleSmoothSmoothCNOT  +  (133) 

MeasZDoubleSmooth  + 

-  DoubleSmoothS 
2 

DoubleSmoothT^  =  DoubleSmoothT  (134) 


17.  Double  smooth  H:  To  perform  the  logical  H  gate,  we  can  use  a  measurement-based 
rotation  procedure  similar  to  the  procedure  for  implementing  the  T  gate.  However, 
as  described  in  [37],  it  is  possible  to  do  this  with  lower  overhead  using  the  following 
procedure.  First  a  patch  of  the  surface  surrounding  smooth  holes  is  cut  out  using 
a  chain  of  Z  measurements,  followed  by  an  error  correction  round  needed  to  correct 
the  sign  of  newly  created  weight  three  Z  stabilizers.  Next,  a  transversal  round  of  H 
operations  is  done  inside  the  patch.  A  set  of  physical  swap  gates  are  then  required  to 
reconnect  the  patch  to  the  surface  by  shifting  the  patch  (this  requires  as  many  cycles  as 
the  size  of  the  patch,  which  is  assumed  to  be  3d).  Then  another  error  correction  round 
needs  to  be  done.  At  this  point,  the  qubit  was  converted  to  a  rough  one  and  neets 
to  be  converted  back.  This  requires  a  smooth  ancilla  in  the  |+)  state,  a  CNOT  gate, 
followed  by  a  measurement  of  the  rough  qubit  in  the  Z  basis.  The  cost  of  this  operation 
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is  detailed  below. 

DoubleSmoothH  =  MeasZ  +  TEC  +  H  +  3d  x  3  x  CNOT  +  TEC  +  (135) 

DoubleSmoothPrep  |+)  +  DoubleRoughSmoothCNOT  + 
MeasZDoubleSmooth 


From  these,  we  can  determine  the  execution  time  of  the  algorithm  as  follows. 

Execution  Time  =  DistillDoubleSmooth(|YL) ,  Li)  +  (136) 

(1/Algorithm. paralPrep  |0))  x  Algorithm. numPrep  |0)  x  PrepDoubleSmooth  |0l)  +(137) 
(1/Algorithm. paralPrep  |+))  x  Algorithm. numPrep  | +)  x  PrepDoubleSmooth  |+l)  -((138) 
(1/ Algorithm. paralH)  x  Algorithm. nurnH  x  DoubleSmoothH  +  (139) 

(1/Algorithm.paralS)  x  Algorithm. nurnS  x  DoubleSmoothS  +  (140) 

(1/ Algorithm. paralSl)  x  Algorithm. numSl  x  DoubleSmoothS^  +  (141) 

(1/ Algorithm. paralT)  x  Algorithm. numT  x  DoubleSmoothT  +  (142) 

(1/Algorithm.paralTl)  x  Algorithm. numTl  x  DoubleSmoothT^  +  (143) 

(1/Algorithm.paralCNOT)  x  Algorithm. numCNOT  x  DoubleSmoothS moothCNOT  {(144) 
(1/ Algorithm. paralMeasX)  x  Algorithm. MeasX  x  MeasXSmooth  +  (145) 

(1/ Algorithm. paralMeasZ)  x  Algorithm. MeasZ  x  MeasZSmooth  (146) 

13  Number  of  Gates 

Our  remaining  task  is  to  estimate  the  number  of  gates  needed  by  the  entire  algorithm.  We 
obtain  an  accurate  gate  count  for  each  gate  type  as  follows.  First,  we  estimate  the  number 
of  gates  needed  to  perform  error  correction.  Since  error  correction  is  performed  continuously, 
and  all  other  operations  require  a  small  number  of  gates,  this  is  the  dominant  factor  in  the  fi¬ 
nal  gate  count.  The  gate  count  for  error  correction  is  estimated  for  each  of  the  three  syndrome 
extraction  techniques  (Shor,  Steane,  Knill)  separately.  Second,  we  add  the  small  number  of 
gates  required  to  perform  logical  operations.  Note  that  these  gate  counts  exclude  syndrome 
measurements  (to  prevent  double  counting),  and  therefore  the  estimate  is  independent  of  the 
syndrome  extraction  method  of  choice. 

In  the  interest  of  readability,  we  will  assume  that  our  variables  refer  to  gates  (i.e.,  the 
variable  is  a  vector  with  each  entry  corresponding  to  one  gate  type).  For  example,  Prep  |0) 
refers  to  a  vector  that  is  almost  everywhere  zero  except  a  one  in  the  entry  that  represents 
the  |0)  state  preparation.  Similarly,  H  represents  the  Hadamard  gate  in  the  above  vector 
notation,  etc. 


1.  No-op  (stabilizer  measurement  cycle):  All  the  stabilizers  (site  and  plaquette) 
are  measured.  Since  the  syndrome  measurement  circuit  is  different  for  each  of  the 
three  syndrome  extraction  methods  (Steane,  Shor  and  Knill),  we  have  three  different 
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expressions  for  the  gate  count: 

Steane.EC  =  Prep  |0)  +  Prep  |+)  +  8CNOT  +  MeasX  +  MeasZ  (147) 

Shor.EC  =  8Prep  |0)  +  2Prep  |+)  +  18CNOT  +  4MeasX  +  6MeasZ  +  4H(148) 
Knill.EC  =  Prep  |0)  +  Prep  |+)  +  2CNOT  +  MeasX  +  MeasZ  (149) 

Below  we  express  the  number  of  gates  that  need  to  be  performed  (excluding  syndrome 
measurement)  to  implement  each  gate.  We  follow  the  same  notation  as  in  the  previous 
section. 

2.  Error-correction  cycles  for  fault  tolerance 

TEC  =  EC  x  d  (150) 

3.  Prepare  single  logical  hole: 

•  Prepare  smooth  |0l)  and  |+l) 

PrepSmooth  |0l)  =  d  x  d  x  MeasX  (151) 

PrepSmooth  |+l)  =  d  x  d  x  Prep  |+)  (152) 

•  Prepare  rough  |0l)  and  |+l) 

PrepRough  |0l)  =  d  x  d  x  Prep  |0)  (153) 

PrepRough  |+l)  =  d  x  d  x  MeasZ  (154) 

4.  Measure  logical  hole  in  X  or  Z : 

•  Smooth  holes 


MeasXSmooth  =  3  x  d  x  d  x  MeasX  (155) 

MeasZSmooth  =  8  x  d  x  d  x  MeasZ  (156) 

•  Rough  holes 

MeasXRough  =  8  x  d  x  d  x  MeasZ  (157) 

MeasZRough  =  3  x  d  x  d  x  MeasX  (158) 


5.  Grow  hole  It  is  not  possible  to  accurately  quantify  the  number  of  gates  when  the  size 
of  the  region  into  which  the  hole  grows  is  unknown.  However,  the  number  of  gates  does 
not  contribute  significantly  to  the  final  gate  count,  and  is  reported  here  as  0. 

GrowSmooth  =  0  (159) 


GrowRough 


0 


(160) 
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6.  Shrink  hole  Similarly  as  for  hole  growth  we  can  assume  that  the  additional  gate  count 
is  negligible. 

ShrinkRough  =  0  (161) 

ShrinkSmooth  =  0  (162) 

7.  Double  holes: 


PrepDoubleSmooth  0l) 

=  2PrepSmooth  0L) 

(163) 

PrepDoubleSmooth  +l) 

=  2PrepSmooth  +l) 

(164) 

MeasXDoubleSmooth 

=  MeasXSmooth 

(165) 

MeasZDoubleSmooth 

=  2MeasZSmooth 

(166) 

PrepDoubleRough  0l) 

=  2PrepRough  0l) 

(167) 

PrepDoubleRough  +L) 

=  2PrepRough  +L) 

(168) 

MeasZDoubleRough 

=  MeasZRough 

(169) 

MeasXDoubleRough 

=  2MeasXRough 

(170) 

8.  Low-level  state  injection  (single  smooth  hole): 

InjectSnrooth  |  Al)  =  MeasZ  +  4X  +  6Z  (171) 

InjectSmooth  |YL)  =  MeasZ  +  4X  +  6Z  (172) 

9.  Preparing  double  smooth  \Yl)  and  | Al) 

PrepDoubleSmooth  |Al)  =  InjectSmooth  |Al)  (173) 

PrepDoubleSmooth  |YL)  =  InjectSmooth  |YL)  (174) 

10.  Double  smooth-rough  CNOT 

DoubleSmoothRoughCNOT  =  0  (175) 

(176) 


11.  Double  smooth-rough  multi-target  CNOT 

DoubleSmoothRoughMTCNOT(n)  =  0  (177) 


12.  Double  smooth-smooth  CNOT 

DoubleSmoothSnroothCNOT  =  PrepDoubleSmooth  |0l)  +  (178) 

PrepDoubleRough  |+l)  + 
MeasZDoubleSmooth  + 
MeasXDoubleRough 
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13.  Double  smooth-smooth  multi-target  CNOT 

DoubleSmoothSmoothMTCNOT(?r)  =  PrepDoubleSmooth  |0l)  +  (179) 

PrepDoubleRough  |+l)  + 
MeasZDoubleSmooth  + 
MeasXDoubleRough 

14.  Prepare  logical  double  smooth  j Yl)  and  | Al)  to  a  level. 

•  Double  smooth  | Yj,) 

DistillDoubleSmooth(|YL) ,  L)  =  7DistillDoubleSmooth(|YL) ,  L  —  1)  +(180) 

3DoubleSmoothSmoothMTCNOT(3)  + 
DoubleSmoothSmoothMTCNOT(2)  + 
3MeasZDoubleSmooth  + 
3MeasXDoubleSmooth 

DistillDoubleSmooth(|YL) ,  0)  =  7PrepDoubleSmooth  |YL)  +  (181) 

3DoubleSmoothSmoothMTCNOT(3)  + 
DoubleSmoothSmoothMTCNOT(2)  + 
3MeasZDoubleSmooth  + 
3MeasZDoubleSmooth. 

•  Logical  |  A) 

DistillDoubleSmooth(|  Al)  ,  L)  =  15DistillDoubleSmooth(|AL) ,  L  —  1)  -(182) 

5DoubleSmoothSmoothMTCNOT(7)  + 
DoubleSmoothSmoothMTCNOT(6)  + 
lOMeasZDoubleSmooth  + 
4MeasXDoubleSmooth 

DistillDoubleSmooth  |Al)  ,  0)  =  15PrepDoubleSmooth  |Al)  +  (183) 

5DoubleSmoothSmoothMTCNOT(7)  + 
DoubleSmoothSmoothMTCNOT(6)  + 
lOMeasZDoubleSmooth  + 
4MeasXDoubleSmooth. 

15.  Double  smooth  S  and  S'* 

DoubleSmoothS  =  2DoubleSmoothSmoothCNOT  +  (184) 

2DoubleSmoothH 

DoubleSmoothS*  =  DoubleSmoothS  (185) 
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16.  Double  smooth  T  and  T ! 

DoubleSmoothT  =  DistillDoubleSmooth(|AL) ,  L2)  +  (186) 

DoubleSmoothSmoothCNOT  + 
MeasZDoubleSmooth  + 

^  DoubleSmoothS 

DoubleSmoothT^  =  DoubleSmoothT.  (187) 

17.  Double  smooth  H 

DoubleSmoothH  =  20  x  d  x  MeasZ  +  21xdxrfxH  +  3dx3x  CNOT  +  (188) 
DoubleSmoothPrep  |+)  +  DoubleRoughSmoothCNOT  + 
MeasZDoubleSmooth. 

By  dividing  the  execution  time  and  time  needed  to  do  one  error  correction  obtained  in 
the  previous  section,  we  obtain  the  number  of  error-correction  cycles,  denoted  EC  Cycles. 
From  the  above  equations,  we  can  determine  the  total  gate  count  during  the  execution  of  the 


algorithm. 

Gate  Count  =  Physical  Area  x  EC  Cycles  x  EC  +  (189) 

DistillDoubleSmooth(|YL)  ,Li)  +  (190) 

Algorithm. numPrep  1 0)  x  PrepDoubleSmooth  |0l)  +  (191) 

Algorithm. numPrep  | +)  x  PrepDoubleSmooth  |+l)  +  (192) 

Algorithm. numH  x  DoubleSmoothH  +  (193) 

Algorithm. numS  x  DoubleSmoothS  +  (194) 

Algorithm. numS^  x  DoubleSmoothS^  +  (195) 

Algorithm. nurnT  x  DoubleSmoothT  +  (196) 

Algorithm. numT^  x  DoubleSmoothT!  +  (197) 

Algorithm. numCNOT  x  DoubleSmoothSmoothCNOT  +  (198) 

Algorithm. MeasX  x  MeasXSmooth  +  (199) 

Algorithm. MeasZ  x  MeasZSmooth  (200) 

14  Software  Tools  for  Resource  Estimation 


The  QuRE  toolbox  is  implemented  as  a  suite  of  Octave  scripts  that  automatically  generate 
the  resource  estimates  for  a  cross  product  of  algorithms,  quantum  technologies,  and  error- 
correction  techniques.  The  software  processes  the  following  inputs.  For  each  analyzed  algo¬ 
rithm,  we  require  the  number  of  logical  gates  of  each  type,  the  parallelization  factor  for  these 
gates  (how  many  gates  of  each  type  can  be  safely  scheduled  in  parallel),  and  the  number  of 
logical  qubits.  These  inputs  are  located  in  the  ”Alg_*.m”  files.  Another  required  input  is 
information  about  each  combination  of  physical  quantum  technology  and  control  protocol. 
The  required  information  is  the  time  needed  by  each  gate  type,  the  error  of  the  worst  gate, 
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and  the  error  rate  per  unit  of  time.  These  inputs  are  located  in  a  separate  ”PMD_*.m”  file 
for  each  quantum  technology  /  control  protocol  combination. 

The  heart  of  the  software  are  the  recursive  relations  specifying  the  amount  of  resources 
needed  by  the  error-correcting  codes  at  each  level  of  concatenation,  as  well  as  the  equations 
specifying  the  behavior  of  the  surface  code.  These  relations  appear  in  this  report,  and  are 
implemented  in  the  ”ECC_*.m”  scripts. 

Results  for  all  combinations  of  algorithms,  physical  quantum  technologies,  error  correction 
protocols  (the  Chinese  menu)  are  generated  by  running  the  ’’Print. nr”  script.  Estimates  are 
output  in  the  out  directory  and  the  table  directory.  In  the  out  directory,  one  file  is  generated 
for  each  entry  in  the  Chinese  menu.  The  table  directory  includes  results  in  a  succinct  form  - 
these  results  are  also  shown  in  Section  15  of  this  report. 

A  typical  output  in  the  “out”  directory  includes  the  following  entries: 

•  Information  if  the  error-correction  threshold  is  met,  and  the  target  level  of  concatenation 
to  meet  50%  circuit  reliability.  In  case  of  the  surface  code,  code  distance  is  reported 
instead  of  the  concatenation  level. 

•  Probability  of  success  of  the  computation. 

•  Number  of  physical  qubits  needed.  In  case  of  Bacon-Shor  code  and  the  Kirill's  post¬ 
selection  scheme  which  use  ancilla  factories  we  also  report  the  number  of  qubits  needed 
by  the  ancilla  factory  in  addition  to  the  total  qubit  count. 

•  Running  time  of  the  algorithm  in  ns. 

•  The  length  of  a  time  interval  after  which  idle  qubits  must  be  error  corrected,  and  the 
number  of  error-correction  rounds  that  must  be  done  in  total. 

•  A  detailed  gate  count  that  includes  the  number  of  occurrences  of  each  physical  gate  con¬ 
struct  during  the  entire  computation.  We  report  the  following  gate  constructs:  CNOT , 
SWAP ,  H1  |+  >  prep .,  |0  >  prep.,  Xmeas.,  Zmeas.,  X,  Y ,  Z,  S  and  T. 

For  the  Bacon  Shor  code  and  the  Knill’s  post-selection  scheme  which  require  ancilla  fac¬ 
tories  the  following  information  is  reported: 

•  Time  needed  to  produce  and  inject  an  ancillary  state. 

•  A  detailed  gate  count  specifying  the  number  of  gates  needed  to  produce  one  good  an¬ 
cillary  state. 

•  The  number  of  gates  needed  to  produce  all  ancillary  states  required  by  all  logical  S  and 
T  gates  that  occur  in  the  algorithm. 

Inspecting  the  results  generated  by  our  tool  resulted  in  the  following  observations: 

•  Unlike  topological  error-correcting  codes,  the  concatenated  codes  do  not  meet  the  thresh¬ 
old  of  most  quantum  architectures. 

•  The  number  of  gates  and  running  time  for  topological  and  concatenated  codes  is  com¬ 
parable. 
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•  The  Knill’s  post-selection  scheme  has  higher  threshold  than  the  Bacon-Shor  and  Steane 
code,  and  therefore  often  requires  fewer  levels  of  concatenation  and  the  resource  require¬ 
ments  are  lower. 

The  outputs  of  the  software  were  validated  by  comparing  the  outputs  against  the  equations 
present  in  this  report.  Since  the  data  input  for  all  algorithms  and  quantum  technology  / 
control  protocol  combinations  have  the  same  format,  it  was  sufficient  to  validate  results  for 
one  algorithm  and  quantum  technology  /  control  protocol  combination  (we  chose  the  ground 
state  estimation  algorithm  and  superconducting  qubits  with  primitive  control).  We  verified 
results  for  all  four  error-correcting  codes. 

The  outputs  for  all  entries  of  the  Chinese  menu  appear  in  the  ’’out”  and  ’’table”  directory, 
and  selected  results  are  also  reported  in  Section  15. 

15  Numerical  Results 

Here  we  present  an  overview  of  the  numerical  results  obtained  by  the  QuRE  toolbox.  We 
only  report  the  most  basic  properties  for  each  error  correcting  code,  namely  the  duration 
and  number  of  gates  required  to  execute  a  single  logical  gate  at  a  specified  concatenation 
level  or  code  distance.  For  each  combination  of  error  correcting  code,  algorithm,  and  physical 
quantum  technology  we  also  report  the  number  of  qubits,  number  of  gates  and  running  time 
required  by  the  quantum  computer.  More  detailed  results  can  be  obtained  by  running  the 
QuRE  toolbox. 

The  gate  time  for  a  logical  gate  at  a  specified  level  of  concatenation  of  the  concatenated 
codes  is  obtained  by  QuRE  by  solving  the  the  recursive  equations  in  Sections  6,  7,  and  8. 
Tables  7,  8,  and  9  show  the  results  for  the  Steane  code,  the  Bacon-Shor  code,  and  the  Knill 
scheme  respectively.  The  tables  show  results  for  one  to  five  levels  of  concatenation.  The 
reported  gate  times  are  based  on  the  superconducting  quantum  technology  with  primitive 
control.  Table  10  summarizes  the  gate  times  for  the  surface  code,  again  using  superconductors 
with  primitive  control.  The  columns  of  the  table  show  gate  times  for  various  choices  of  code 
distance.  Note  that  the  surface  code  doesn’t  need  the  SWAP  operation,  and  we  do  not  need 
to  distinguish  a  horizontal  or  vertical  CNOT  because  the  cost  of  both  gates  is  identical.  We 
observe  that  while  the  gate  time  for  concatenated  codes  increases  sharply  with  increasing  level 
of  concatenation,  the  gate  time  for  the  surface  code  increases  only  moderately  with  increasing 
code  distance. 

The  number  of  gates  of  each  type  required  to  execute  the  error  correction  operation  (£C) 
with  the  concatenated  codes  is  reported  in  Tables  11,  12,  and  13.  The  tables  show  results 
for  one  to  five  levels  of  concatenation.  We  report  the  gate  count  for  the  error  correction 
operation  because  this  is  the  most  frequently  repeated  operation  during  computation,  and 
error  correction  is  performed  as  part  of  any  other  gate.  Detailed  gate  count  for  any  logical 
operation  at  any  concatenation  level  can  be  obtained  from  the  QuRE  toolbox,  but  we  do  not 
report  these  results  here  as  the  cost  of  error  correction  is  the  dominant  factor. 

The  resources  required  by  a  quantum  computer  for  each  combination  of  algorithm,  physical 
quantum  technology,  control  protocol  and  error  correcting  code  appears  in  Table  15.  The  first 
five  columns  identify  the  combination  that  is  being  evaluated,  the  next  three  columns  show  the 
resources,  and  the  last  column  shows  the  level  of  concatenation  or  code  distance.  Due  to  space 
constraints,  the  algorithm,  physical  quantum  technology,  error  correcting  code,  and  syndrome 
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extraction  method  are  labeled  with  abbreviated  labels.  These  abbreviations  are  summarized 
in  Table  14.  The  reported  resources  can  be  interepreted  as  follows.  The  number  of  qubits  is 
the  total  number  of  physical  qubits  needed  by  the  quantum  computer.  The  number  of  gates 
is  the  total  number  of  physical  gates  (across  all  types)  that  need  to  be  executed.  Finally,  the 
running  time  is  the  total  time  in  nanoseconds  needed  to  finish  the  computation. 

Table  6.  Gate  time  (in  ns)  for  Steane  code. 


Operation(s) 

m=l 

m=2 

m=3 

m—4 

m=5 

EC 

876 

2.16e  +  04 

5.8e  +  05 

1.58e  +  07 

4.31e  +  08 

hCNOT 

1.02e  +  03 

2.97e  +  04 

8.16e  +  05 

2.23e  +  07 

6.09e  +  08 

vCNOT 

1.00e  +  03 

2.85e  +  04 

7.76e  +  05 

2. lie  +  07 

5.77e  +  08 

h.SWAP 

1.01e  +  03 

2.97e  +  04 

8.17e  +  05 

2.23e  +  07 

6.1e  +  08 

vSWAP 

978 

2.74e  +  04 

7.44e  +  05 

2.03e  +  07 

5.53e  +  08 

H 

882 

2.24e  +  04 

6.02e  +  05 

1.64e  +  07 

4.47e  +  08 

p\+) 

1.28e  +  03 

3.05e  +  04 

8.19e  +  05 

2.23e  +  07 

6.1e  +  08 

p\o) 

1.28e  +  03 

3.05e  +  04 

8.19e  +  05 

2.23e  +  07 

6.1e  +  08 

Mx 

876 

2.16e  +  04 

5.8e  +  05 

1.58e  +  07 

4.31e  +  08 

Mz 

876 

2.16e  +  04 

5.8e  +  05 

1.58e  +  07 

4.31e  +  08 

X 

886 

2.25e  +  04 

6.02e  +  05 

1.64e  +  07 

4.47e  +  08 

Y 

886 

2.25e  +  04 

6.02e  +  05 

1.64e  +  07 

4.47e  +  08 

Z 

877 

2.24e  +  04 

6.02e  +  05 

1.64e  +  07 

4.47e  +  08 

Peat. 

1.17e  +  03 

3.29e  +  04 

9.02e  +  05 

2.46e  +  07 

6.72e  +  08 

s 

5.29e  +  03 

1.54e  +  05 

4.22e  +  06 

1.15e  +  08 

3.14e  +  09 

T 

5.29e  +  03 

1.54e  +  05 

4.22e  +  06 

1.15e  +  08 

3.14e  +  09 

Table  7.  Gate  time  (in  ns 

)  for  Bacon-Shor  code. 

Operation(s) 

m=l 

m=2 

m=3 

m—4 

m—5 

EC 

326 

4.13e  +  03 

6.05e  +  04 

9.21e  +  05 

1.42e  +  07 

hCNOT 

484 

8.31e  +  03 

1.31e  +  05 

2.04e  +  06 

3.15e  +  07 

vCNOT 

484 

8.31e  +  03 

1.31e  +  05 

2.04e  +  06 

3.15e  +  07 

h.SWAP 

462 

7.83e  +  03 

1.23e  +  05 

1.91e  +  06 

2.94e  +  07 

vSWAP 

462 

7.83e  +  03 

1.23e  +  05 

1.91e  +  06 

2.94e  +  07 

H 

828 

1.37e  +  04 

2.13e  +  05 

3.29e  +  06 

5.07e  +  07 

p\+ ) 

760 

9.13e  +  03 

1.33e  +  05 

2.03e  +  06 

3.12e  +  07 

p\o) 

760 

9.13e  +  03 

1.33e  +  05 

2.03e  +  06 

3.12e  +  07 

Mx 

342 

4.48e  +  03 

6.49e  +  04 

9.86e  +  05 

1.52e  +  07 

Mz 

336 

4.47e  +  03 

6.49e  +  04 

9.86e  +  05 

1.52e  +  07 

X 

336 

4.47e  +  03 

6.49e  +  04 

9.86e  +  05 

1.52e  +  07 

Y 

337 

4.8e  +  03 

6.94e  +  04 

1.05e  +  06 

1.62e  +  07 

Z 

327 

4.46e  +  03 

6.49e  +  04 

9.86e  +  05 

1.52e  +  07 

p 

J  cat 

0 

0 

0 

0 

0 

s 

2.62e  +  03 

4.41e  +  04 

6.89e  +  05 

1.06e  +  07 

1.64e  +  08 

T 

3.44e  +  03 

5.68e  +  04 

8.85e  +  05 

1.37e  +  07 

2. lie  +  08 
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Table  8.  Gate  time  (in  ns)  for  Knill/Steane  code. 


Operation(s) 

m=l 

m=2 

m=3 

rn=4 

m=5 

EC 

912 

5.21e  +  03 

5.72e  +  04 

6.38e  +  05 

7.08e  +  06 

hCNOT 

1.05e  +  03 

7. Tie  +  03 

8.43e  +  04 

9.32e  +  05 

1.04e  +  07 

vCNOT 

1.04e  +  03 

7.53e  +  03 

8.12e  +  04 

8.97e  +  05 

9.97e  +  06 

hSWAP 

358 

4.12e  +  03 

4.46e  +  04 

4.94e  +  05 

5.49e  +  06 

vSWAP 

324 

3.51e  +  03 

3.82e  +  04 

4.24e  +  05 

4.71e  +  06 

H 

918 

5.44e  +  03 

5.89e  +  04 

6.57e  +  05 

7.3e  +  06 

P\+ ) 

640 

3.94e  +  03 

4.43e  +  04 

4.93e  +  05 

5.48e  +  06 

P\*) 

640 

3.93e  +  03 

4.43e  +  04 

4.93e  +  05 

5.48e  +  06 

Mx 

912 

5.21e  +  03 

5.72e  +  04 

6.38e  +  05 

7.08e  +  06 

M.z 

912 

5.21e  +  03 

5.72e  +  04 

6.38e  +  05 

7.08e  +  06 

X 

922 

5.45e  +  03 

5.89e  +  04 

6.57e  +  05 

7.3e  +  06 

Y 

922 

5.45e  +  03 

5.89e  +  04 

6.57e  +  05 

7.3e  +  06 

Z 

913 

5.44e  +  03 

5.89e  +  04 

6.57e  +  05 

7.3e  +  06 

Peat 

1.21e  +  03 

7.98e  +  03 

9.03e  +  04 

1.01e  +  06 

1.12e  +  07 

s 

5.38e  +  03 

3.53e  +  04 

3.94e  +  05 

4.38e  +  06 

4.87e  +  07 

T 

5.38e  +  03 

3.53e  +  04 

3.94e  +  05 

4.39e  +  06 

4.87e  +  07 

Table  9. 

Gate  time  (in 

ns)  for  surface  code. 

Operation(s) 

d=3 

d=21 

d=51 

d=101 

EC 

166 

166 

166 

166 

166 

CNOT 

6.71e  +  03 

1.53e  +  04 

4.56e  +  04 

l.le +  05 

2.18e  +  05 

H 

4.78e  +  03 

1.09e  +  04 

3.22e  +  04 

7.8e  +  04 

1.54e  +  05 

P\+) 

598 

1.26e  +  03 

3.59e  +  03 

8.57e  +  03 

1.69e  +  04 

P\Q) 

514 

1.18e  +  03 

3.5e  +  03 

8.48e  +  03 

1.68e  +  04 

Mx 

16 

16 

16 

16 

16 

Mz 

10 

10 

10 

10 

10 

S 

2.3e  +  04 

5.25e  +  04 

1.56e  +  05 

3.77e  +  05 

7.45e  +  05 

T 

1.87e  +  04 

4.27e  +  04 

1.27e  +  05 

3.07e  +  05 

6.07e  +  05 

Table  10.  Gate  count  per  error  correction  operation  with  Steane  code. 


Operation(s) 

m=l 

111  = 

=2 

m= 

=3 

m= 

=4 

m= 

=5 

hCNOT 

14 

1.85e 

+ 

03 

3.27e 

+ 

05 

5.84e 

+ 

07 

1.04e 

+ 

10 

vCNOT 

28 

3.85e 

+ 

03 

6.76e 

+ 

05 

1.2e 

+ 

08 

2.15e 

+ 

10 

hSWAP 

18 

5.29e 

+ 

03 

9.7e 

+ 

05 

1.74e 

+ 

08 

3. lie 

+ 

10 

vSWAP 

15 

4.84e 

+ 

03 

8.58e 

+ 

05 

1.52e 

+ 

08 

2.71e 

+ 

10 

H 

7 

959 

1.69e 

+ 

05 

3e 

+ 

07 

5.36e 

+ 

09 

P\+) 

8 

1.06e 

+ 

03 

1.87e 

+ 

05 

3.34e 

+ 

07 

5.95e 

+ 

09 

P\*) 

12 

1.58e 

+ 

03 

2.81e 

+ 

05 

5e 

+ 

07 

8.92e 

+ 

09 

Mx 

20 

2.64e 

+ 

03 

4.68e 

+ 

05 

8.34e 

+ 

07 

1.49e 

+ 

10 
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Mz 

0 

0 

0 

0 

0 

X 

0 

0 

0 

0 

0 

Y 

0 

0 

0 

0 

0 

Z 

0 

0 

0 

0 

0 

Peat 

0 

0 

0 

0 

0 

s 

0 

0 

0 

0 

0 

T 

0 

0 

0 

0 

0 

Table  11.  Gate  count 

per  error  correction  operation  with  Bacon-Shor  code. 

Operation(s) 

rn=l 

m—2 

m=3 

m=4 

m=5 

hCNOT 

29 

3.47e  +  03 

5.47e  +  05 

1.04e  +  08 

2.09e  +  10 

vCNOT 

0 

0 

0 

0 

0 

hSWAP 

12 

6.85e  +  03 

1.66e  +  06 

3.51e  +  08 

7.24e  +  10 

vSWAP 

0 

0 

0 

0 

0 

H 

0 

0 

0 

0 

0 

P\+) 

9 

927 

1.52e  +  05 

2.94e  +  07 

5.94e  +  09 

P\o ) 

9 

1.14e  +  03 

1.77e  +  05 

3.34e  +  07 

6.7e  +  09 

Mx 

9 

792 

1.34e  +  05 

2.67e  +  07 

5.43e  +  09 

Mz 

9 

792 

1.34e  +  05 

2.67e  +  07 

5.43e  +  09 

X 

1 

82 

1.44e  +  04 

2.87e  +  06 

5.85e  +  08 

Y 

0 

0 

0 

0 

0 

Z 

1 

82 

1.44e  +  04 

2.87e  +  06 

5.85e  +  08 

Peat 

0 

0 

0 

0 

0 

s 

0 

0 

0 

0 

0 

T 

0 

0 

0 

0 

0 

Table  12.  Gate  count 

per  error  correction  operation  with  Knill/Steane  code. 

Operation(s) 

111=1 

m—2 

rn=3 

m=4 

m=5 

hCNOT 

14 

1.12e  +  03 

7.3e  +  04 

4.98e  +  06 

3.39e  +  08 

vCNOT 

28 

1.06e  +  03 

7.33e  +  04 

4.98e  +  06 

3.39e  +  08 

hSWAP 

18 

1.85e  +  03 

1.38e  +  05 

9.51e  +  06 

6.5e  +  08 

vSWAP 

15 

2.22e  +  03 

1.43e  +  05 

9.63e  +  06 

6.53e  +  08 

H 

7 

28 

112 

448 

1.79e  +  03 

P\+ ) 

8 

696 

4.77e  +  04 

3.25e  +  06 

2.21e  +  08 

P\0) 

12 

696 

4.77e  +  04 

3.25e  +  06 

2.21e  +  08 

Mx 

20 

736 

4.79e  +  04 

3.25e  +  06 

2.21e  +  08 

Mz 

0 

656 

4.76e  +  04 

3.25e  +  06 

2.21e  +  08 

X 

0 

0 

0 

0 

0 

Y 

0 

0 

0 

0 

0 

Z 

0 

0 

0 

0 

0 

Peat 

0 

0 

0 

0 

0 

s 

0 

0 

0 

0 

0 
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0  0  0  0  0 


Table  13.  List  of  abbreviations. 


Algorithms 

Technologies 

Control 

Binary  Welded  Tree:  BWT 
Boolean  Formula:  BFA 

Class  Number:  CNA 

Ground  State  Est.:  GSE 
Quant.  Linear  Syst.:  QLS 
Shortest  Vector:  SVP 
Triangle  Finding:  TFP 

Quantum  Dots:  DOT 
Neutral  Atoms:  NEU 
Photonics  I:  PHI 
Photonics  II:  PH2 
Superconductors:  SUP 

Ion  Traps:  TRA 

Primitive:  PRI 

Optimal:  OPT 

Solovay  Kitaev:  SOK 

Trotter:  TRO 

Dynamically  Cor.  Gates:  DCG 

Error  Correction 

Syndrome  Extraction 

Steane:  STE 

Bacon-Shor:  BSH 

Knill’s  scheme:  C46 

Surface  code:  TOP 

Steane:  STE 

Shor:  SHO 

Knill:  KNI 

Table  14.  Final  resource  estimates. 


c n 

£ 

O 

a  a 


Algorithm 

Technology 

Control 

QECC 

Syndrome 

Measuremei 

+5 

-O 

3 

G* 

C fl 

0) 

0 

H 

bJO 

a 

a 

a 

a 

Pi 

a  g 
a  § 

.2  a 

x,  o 

QJ  o 
X3  o 

r9  h 
U  o 

BWT 

NEU 

OPT 

TOP 

KNI 

9.33e  +  10 

6.08e  +  24 

5.69e  + 

18 

83 

BWT 

NEU 

OPT 

TOP 

SHO 

1.87e+  11 

4.25e  +  25 

6.06e  + 

18 

83 

BWT 

NEU 

OPT 

TOP 

STE 

6.22e  +  10 

1.21e  +  25 

5.98e  + 

18 

83 

BWT 

NEU 

PRI 

TOP 

KNI 
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16  Conclusion 

This  report  quantifies  the  resources  needed  to  run  a  variety  of  quantum  programs  on  quantum 
computers  with  realistic  properties.  We  quantify  the  number  of  physical  qubits  required  to  run 
each  program,  the  execution  time  on  each  of  the  physical  technologies  of  choice,  the  probability 
of  success  of  the  computation,  as  well  as  gate  count  for  each  quantum  gate  type.  In  the 
course  of  performing  this  resource  estimation,  we  made  a  number  of  interesting  observations. 
One  of  our  most  interesting  observations  is  the  fact  that  the  amount  of  resources  needed 
to  perform  topological  versus  concatenated  quantum  error  correction  is  comparable.  This  is 
surprising  because  the  nature  of  the  codes  and  the  model  of  quantum  computation  is  very 
different.  However,  we  still  believe  that  topological  error  correction  is  superior  in  systems 
with  a  large  number  of  qubits.  Due  to  its  much  higher  error-correction  threshold,  topological 
error  correction  is  able  to  protect  quantum  information  in  systems  that  cannot  be  protected 
by  concatenated  codes. 
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Appendix  A 

17.1  Implementation  of  the  tile  operations  of  the  C,\  code 

In  the  following,  “a  — >  b ”  means  applying  a  CNOT  gate  with  a  being  the  control  qubit  and  b 

being  the  target  qubit,  “a  <t=>  6”  means  applying  a  SWAP  gate  on  qubits  a  and  b. 
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Time  step  1: 
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Time  step  4: 
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Time  step  5: 
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We  choose  the  index  such  that  a  quantum  teleportation  occurs  on  the  qubits  di,a.j,ai+4 
for  i  =  1, 2, 3, 4.  Observe  that  the  data  qubits  d-\ ,  d2 ,  d3 ,  (h  are  transferred  to  the  center  after 
teleportation  and  no  SWAPs  are  needed  here.  However,  the  error  detection  ED+  for  structure 
II  needs  two  SWAPs  and  it  takes  one  more  step.  Its  first  time  step  is  initialized  as  follows: 
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In  addition,  applying  the  Pauli  operators  X  or  Z  to  complete  the  teleportation  may 
one  or  two  more  steps,  but  this  is  not  shown.  The  EDo  for  structure  1  at  time  step  1 
follows  and  the  rest  of  the  steps  are  similar  to  those  of  the  ED+ : 
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We  found  the  operation  and  required  time  of  EDo  are  the  same  as  those  of  ED+  and  we  omit 
the  subscripts  0  or  +. 

Remark:  after  a  logical  Hadamard  gate,  the  labels  of  data  qubits  2  and  3  are  switched. 
This  can  be  fixed  by  applying  appropriate  SWAPs  and  it  takes  2  more  time  steps  in  structures 
I  or  II.  However,  we  don’t  adjust  it  until  a  CNOT  gate  is  operating  on  two  tiles  with  different 
labels. 
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Time  step  6: 
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P|0)  and  P|+y  The  logical  state  preparation  circuit  Pj0)  in  Fig.  22:  Time  step  1: 
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The  logical  state  preparation  circuit  P|+)  in  Fig.  22:  Time  step  1: 
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Time  step  2: 
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The  decoding  circuit  of  the  C4  code  when  the  spectator  state  is  |0)  in  Fig.  26:  Time  step 

1: 
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Time  step  3: 
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The  decoding  circuit  when  the  spectator  state  is  |+)  in  Fig.  26:  Time  step  1: 
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Time  step  3: 


Time  step  4: 
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S  and  T  gates  Let  |9i 929394)  denote  the  |+«)  state  and  \d\d2d3d4)  be  the  data  qubit  in  Fig. 
13. 

Implementation  of  the  S  gate: 

Time  step  1: 
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Time  step  6: 


0 

O 

O 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

9i 

o 

o 

o 

o 

$ 

0 

d\ 

o 

o 

d3 

o 

o 

o 

o 

o 

92 

o 

o 

o 

o 

t 

O 

d>2 

o 

o 

di 

Time  step  7: 


o 

0 

O 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

9i 

d\ 

o 

o 

d3 

-t  93 

0 

o 

o 

o 

o 

o 

o 

o 

o 

o 

92 

d2 

o 

o 

c?4 

94 

0<^g> 


106  Estimating  the  Resources  for  Quantum  Computation  with  the  QuRE  Toolbox 


Time  step  10: 
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Now  let  I91 929394)  denote  the  T  |+)  state  and  \d1d2d3d4)  be  the  data  qubit  in  Fig.  18 
Implementation  of  the  T  gate: 

Time  step  1: 
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18  Preparation  of  a  logical  state  of  the  Knill’s  post-selection  scheme  at  the  top 
level 

Recall  that  the  tile  of  the  Steane  code  is  a  6  x  8  square  lattice  as  follows: 
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(Note  that  |<?i) ,  | q2) ,  \q3) ,  \qi) ,  I®) ,  | q&) ,  and  | q7)  in  Fig.  24  are  reordered  as  | d6) ,  | d7) , 
\d\) ,  Ids) ,  |c4) ,  \d2) ,  and  \d3)  )  Preparation  of  the  logical  state  |0)  of  the  concatenation  of 
C™,  and  Cec: 
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Time  step  2: 
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Time  step  3: 
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Time  step  5: 
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Time  step  6: 
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Time  step  7: 
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Time  step  8: 
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Time  step  9: 
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